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(57) Abstract 

The present invention provides obfuscation techniques for enhancing software security. In one embodiment, a method for obfuscation 
techniques for enhancing software security includes selecting a subset of code (e.g., compiled source code of an application) to obfuscate, 
and obfuscating the selected subset of the code. The obfuscating includes applying an obfuscating transformation to the selected subset of 
the code. The transformed code can be weakly equivalent to the uiitransformed code. The applied transformation can be selected based on 
a dcsiied level of security (e.g., resistance to reverse engineering). The applied transformation can include a control transfonnation that can 
be cieating using opaque constructs, which can be constructed using aliasing and concurrenc>' techniques. Accordingly, the code can be 
obfuscated for enhanced software security based on a desired level of obfuscation (e.g., based on a desired potency, resilience, and cost). 
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OBFUSCATION TECHNIQUES FOR ENHANCING SOFTWARE SECURITY 
5 FTRT.n OF THE INVENTION 

The present invention relates to methods and 
apparatus for preventing, or at least hampering, 
interpretation, decoding, or reverse engineering of 
software. More particularly, although not exclusively, 

10 the present invention relates to methods and apparatus 
for increasing the structural and logical complexity of 
software by inserting, removing, or rearranging 
identifiable structure or information from the software 
in such a way as to exacerbate the difficulty of the 

15 process of decompilation or reverse engineering. 

RACKGROUND 

The nature of software renders it suscepcible to 
analysis and copying by third parties. There have been 

20 considerable efforts to enhance software security, 
which have met with mixed success. Such security 
concerns relate to the need to prevent unauthorized 
copying of software and a desire to conceal programming 
techniques in which such techniques can be determined 

25 via reverse engineering. 

Established legal avenues, such as copyright, 
provide a measure of legislative protection. However, 
enforcing legal rights created under such regimes can 
be expensive and time consuming. Further, the 

3 0 protection afforded to software under copyright does 
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not cover programming techniques . Such techniques 
(i.e., the function as opposed to the form of the 
software) are legally difficult to protect. A reverse 
engineer could escape infringement by rewriting the 
5 relevant software, ab initio, based on a detailed 

knowledge of the function of the software in question. 
Such knowledge can be derived from analyzing the data 
structures, abstractions, and organization of the code. 
Software patents provide more comprehensive 

10 protection. However, it is clearly an advantage to 
couple legal protection of software with technical 
protection. 

Previous approaches to the protection cf 
proprietary software have either used encryption-based 

15 hardware solutions or have been based on simple 

rearrangements of the source code structure. Hardware- 
based techniques are non- ideal in that they are 
generally expensive and are tied to a specific platform 
or hardware add-on. Software solutions typically 

20 include trivial code obfuscators, such as the Crema 
obfuscator for Java™. Some obfuscators target the 
lexical structure of the application and typically 
remove source code formatting and comments and rename 
variables. However, such an obfuscation technique does 

25 not provide sufficient protection against malicious 

reverse engineering: reverse engineering is a problem 
regardless of the form in vvhich the software is 
distributed. Further, the problem is exacerbated when 
the software is distributed in hardware- independent 

30 formats that retain much or all of the information in 
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the original source code. Examples of such formats are 
Java™ bytecode and the Architecture Neutral 
Distribution Format (ANDF) . 

Software development can represent a significant 
5 investment in time, effort, and skill by a programmer. 
In the commercial context, the ability to prevent a 
competitor from copying proprietary techniques can be 
critical . 

10 SUMMARY 

The present invention provides methods and 
apparatus for obfuscation techniques for software 
security, such as computer itaplemented methods for 
reducing the susceptibility of software to reverse 

15 engineering (or to provide the public with a useful 
choice) . In one embodiment, a computer implemented 
method for obfuscating code, includes testing for 
completion of supplying one or more obfuscation 
transformations to the code, selecting a subset of the 

20 code to obfuscate, selecting an obfuscating transform 
to apply, applying the transformation, and returning to 
the completion testing step. 

In an alternative erribodiment , the present 
invention relates to a method of controlling a computer 

25 so that software running on, stored on, or manipulated 
by the computer exhibits a predetermined and controlled 
degree of resistance to reverse engineering, including 
applying selected obfuscating transformations to 
selected parts of the software, in which a level of 

30 obfuscation is achieved using a selected obfuscation 
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transformation so as to provide a recjuired degree of 
resistance to reverse engineering, effectiveness in 
operation of the software and size of transformed 
software, and updating the software to reflect the 
5 obfuscating transformations. 

In a preferred embodiment, the present invention 
provides a computer implemented method for enhancing 
software security, including identifying one or more 
source code input files corresponding to the source 

10 software for the application to be processed, selecting 
a required level of obfuscation (e.g., potency), 
selecting a maximum execution time or space penalty 
(e.g., cost), reading and parsing the input files, 
optionally along with any library or supplemental files 

15 read directly or indirectly by the source code, 

providing information identifying data types, data 
structures, and control structures used by the 
application to be processed, and constructing 
appropriate tables to store this information, 

20 preprocessing information about the application, in 
response to the preprocessing step, selecting and 
applying obfuscating code transformations to source 
code objects, repeating the obfuscating code 
transformation step until the required potency has been 

25 achieved or the maximum cost has been exceeded, and 
outputting the transformed software. 

Preferably, the information about the application 
is obtained using various static analysis techniques 
and dynamic analysis techniques. The static analysis 

30 techniques include inter-procedural dataflow analysis 
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and data dependence analysis. The dynamic analysis 
techniques include profiling, and optionally, 
information can be obtained via a user. Profiling can 
be used to determine the level of obfuscation, which 
5 can be applied to a particular source code object. 
Transformations can include control transformations 
created using opaque constructs in which an opaque 
construct is any mathematical object that is 
inexpensive to execute from a performance standpoint, 

10 simple for an obfuscator to construct, and expensive 
for a deobfuscator to break. Preferably, opaque 
constructs can be constructed using aliasing and 
concurrency techniques. Information about the source 
application can also be obtained using pragmatic 

15 analysis, which determines the nature of language 
constructs and programming idioms the application 
contains . 

The potency of an obfuscation transformation can 
be evaluated using software complexity metrics. 

20 Obfuscation code transformations can be applied to any 
language constructs: for example, modules, classes, or 
subroutines can be split or merged; new control and 
data structures can be created; and original control 
and data structures can be modified. Preferably, the 

25 new constructs added to the transformed application are 
selected to be as similar as possible to those in the 
source application, based on the pragmatic information 
gathered during preprocessing. The method can produce 
subsidiary files including information about which 

30 obfuscating transformations have been applied and 
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information relating obfuscated code of the transformed 
application to the source software. 

Preferably, the obfuscation transformations are 
selected to preserve the observable behavior of the 
5 software such that if P is the untransf ormed software, 
and P' is the transformed software, P and P' have the 
same observable behavior. More particularly, if P 
fails to terminate or terminates with an error 
condition, then P' may or may not terminate, otherwise 

10 P' terminates and produce the same output as P. 

Observable behavior includes effects experienced by a 
user, but P and P' may run v/ith different detailed 
behavior unobservable by a user. For example, detailed 
behavior of P and P' that can be different includes 

15 file creation, memory usage, and network communication. 
In one embodiment, the present invention also 
provides a deobfuscating tool adopted to remove 
obfuscations from an obfu3cat:*^d application by use of 
slicing, partial evaluation, dataflow analysis, or 

20 statistical analysis. 

BRIEF DESCRIPTION QF THE DRiWINGS 

The present invention will now be described by way 
of example only and with reference to the drawings in 
25 which: 

FIG. 1 illustrates a data processing system in 
accordance with the teachings of the present invention; 

FIG. 2 illustrates a classification of software 
protection including categories of obfuscating 
30 transformations ; 
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FIGs. 3a and 3b show techniques for providing 
software security by (a) server- side execution and (b) 
partial server-side execution; 

FIGs. 4a and 4b show techniques for providing 
5 software security by (a) using encryption and (b) using 
signed native code; 

FIG. 5 shows a technique for providing software 
security through obfuscation; 

FIG. 6 illustrates the architecture of an example 
10 of an obfuscating tool suitable for use with Java™ 
applications; 

FIG. 7 is a table thac tabulates a selection of 
known software complexity metrics; 

FIGs. 8a and 8b illustrate the resilience of an 
15 obfuscating transformation; 

FIG. 9 shows different types of opaque predicates; 

FIGs. 10a and 10b provide examples of (a) trivial 
opaque constructs and (b) weak opaque constructs; 

FIG. 11 illustrates an example of a computation 
20 transformation (branch insertion transformation) ; 

FIGs. 12a through 12d illustrate a loop condition 
insertion transformation; 

FIG. 13 illustrates a transformation that 
transforms reducible flowgraphs into non-reducible 
25 flowgraphs; 

FIG. 14 shows that a section of code can be 
parallelized if it contains no data dependencies; 

FIG. 15 shows that a section of code that contains 
no data dependencies can be split into concurrent 

7 

SUBSTITUTE SHEET (RULE 26) 



wo 99/01815 



PCT/US98/12017 



threads by inserting appropriate synchronization 
primitives; 

FIG. 16 shows how procedures P and Q are inlined 
at their call -sites and then removed from the code; 
5 FIG, 17 illustrates inlining method calls; 

FIG. 18 shows a technique for interleaving two 
methods declared in the same class; 

FIG. 19 shows a technique for creating several 
different versions of a method by applying different 
10 sets of obfuscating transformations to the original 
code ; 

FIGs. 20a through 20c provide examples of loop 
transformations including (a) loop blocking, (b) loop 
unrolling, and (c) loop fission; 
15 FIG. 21 shows a variable splitting example; 

FIG. 22 provides a funccion constructed to 
obfuscate strings *^AAA" , "-'BMiAA" , and ''CCB" ; 

FIG. 23 shows an example merging two 32 -bit 
variables x and y into one -bit variable Z; 
20 FIG. 24 illustrates an example of a data 

transformation for array restructuring; 

FIG. 25 illustrates modifications of an 
inheritance hierarchy; 

FIG, 26 illustrates opaque predicates constructed 
25 from objects and aliases; 

FIG. 27 provides an example of opaque constructs 
using threads; 

FIGs. 28a through 28d illustrate obfuscation vs. 
deobfuscation in which {a) shows an original program 
30 including three statements, Si. 3, being obfuscated, (b) 
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shows a deobfuscator identifying '"constant" opaque 
predicates, (c) shows the deobfuscator determining the 
common code in the statements, and (d) shows the 
deobfuscator applying some final simplifications and 
5 returning the program to its original form; 

FIG. 29 shows an architecture of a Java™ 
deobfuscation tool; 

FIG. 3 0 shows an example of statistical analysis 
used for evaluation; 
10 FIGs. 31a and 31b provide tables of an overview of 

various obfuscating transforras; and 

FIG. 3 2 provides an overview of various opaque 
constructs . 



15 DETAILED DESCRIPTION 

The following description will be provided in the 
context of a Java obfuscation tool, which is currently 
being developed by the applicants. However, it will be 
apparent to one of ordinary sskill in the art that the 

20 present techniques are applicable to other programming 
languages and the invention is not to be construed as 
restricted to Java™ applications. The implementation 
of the present invention in the context of other 
programming languages is considered to be within the 

25 purview of one of ordinary skill in the art. The 
exemplary embodiment that follows is, for clarity, 
specifically targeted at a Java™ obfuscating tool. 

In the description below, the following 
nomenclature will be used. F is the input application 

30 to be obfuscated; P' is the transformed application; T 
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is a transformation such that T transforms P into P' . 
P(T)P' is an obfuscating transformation if P and P' " 
have the same observable behavior. Observable behavior 
is defined generally as behavior experienced by the 
5 user. Thus, P' may have side effects such as creating 
files that P does not, so long as these side effects 
are not experienced by the user. P and P' do not need 
to be equally efficient. 

10 Exemplary Hardware 

FIG. 1 illustrates a data processing system in 
accordance with the teachings of the present invention. 
FIG. 1 shows a computer 100, which includes three major 
elements. Computer 100 includes an input/output (1/0) 

15 circuit 120, which is used to communicate information 
in appropriately structured form to and from other 
portions of computer ICO. Computer 100 includes a 
control processing unit (CPU) 130 in communication with 
I/O circuit 120 and a memory 140 (e.g., volatile and 

20 non-volatile memory) . These elements are ehose 

typically found in most general purpose computers and, 
in fact, computer 100 is intended to be representative 
of a broad category of data processing devices . A 
raster display monitor 16 0 is shown in communication 

25 with I/O circuit 120 and issued to display images 
generated by CPU 130. Any well known variety of 
cathode ray tiabe (CRT) or other type of display can be 
used as display 160. A ccnveritional keyboard 150 is 
also shown in communication with I/O 120. It will be 

30 appreciated by one of ordinary skill in the art that 
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computer 10 0 can be part of a larger system. For 
example, computer 100 can also be in comniunication with 
a network (e.g,, connected to a local area network 
(LAN) ) . 

5 In particular, computer 100 can include 

obfuscating circuitry for enhancing software security 
in accordance with the teachings of the present 
invention, or as will be appreciated by one of ordinary 
skill in the art, the present invention can be 

10 implemented in software executed by computer 100 (e.g., 
the software can be stored in memory 140 and executed 
on CPU 130) . For example, an unobfuscated program P 
(e.g., an application), stored in memory 140, can be 
obfuscated by an obfuscatcr executing on CPU 13 0 to 

15 provide an obfuscated program P' , stored in memory 140, 
in accordance with one einbcdiment of the present 
invention. 



Overview of the Detailed Description 
20 FIG. 6 shows the architecture of a Java™ 

obfuscator. According to the inventive method, Java™ 
application class files are passed along with any 
library files. An inheritance tree is constructed as 
well as a symbol table, providing type information for 
25 all symbols and control flow graphs for all methods. 

The user may optionally provide profiling data files as 
generated by Java*^ profiling tools. This information 
can be used to guide the obfuscator to ensure that 
frequently executed parts of the application are not 
30 obfuscated by very expensive cransf ormations . 
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Information is gathered about the application using 
standard compiler techniques such as interprocedural 
dataflow analysis and data dependence analysis. Some 
can be provided by the user and some by specialized 
5 techniques. The information is used to select and 
apply the appropriate code transformations. 

Appropriate transformations are selected. The 
governing criteria used in selecting the most suitable 
transformation include the requirement that the chosen 

10 transformation blend in naturally with the rest of the 
code. This can be deal'c v/ith by favoring 
transformations with a high appropriateness value. A 
further requirement is that transformations which yield 
a high level of obfuscaticn with low execution time 

15 penalty should be favored. This latter point is 

accomplished by selecting transformations chat maximize 
potency and resilience, and minimize cost. 

An obfuscation priority is allocated to a source 
code object. This will reflect how important it is to 

20 obfuscate the contents of the source code object. For 
example, if a particular source code object contains 
highly sensitive proprietary laaterial, then the 
obfuscation priority will be high. An execution time 
rank is determined for each u\ethod, which equals 1 if 

25 more time is spent executing the method than any other. 

The application is then obfuscated by building the 
appropriate internal data scructures, the mapping from 
each source code object to the appropriate 
transformation, the obfuscation priority, and the 

30 execution time rank. The obfuscating transformations 

-12" 
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are applied until the required level of obfuscation has 
been achieved or until the ma.ximum execution time 
penalty is exceeded. The transformed application is 
then written. 

5 The output of the obfuscation tool is a new 

application that is functionally equivalent to the 
original. The tool can also produce Java™ source files 
annotated with information about which transformations 
have been applied and how the obfuscated code relates 

10 to the original application, 

A number of examples of obfuscating 
transformations will now be described, again in the 
context of a Java™ obfuscator. 

Obfuscating transformations can be evaluated and 

15 classified according to their quality. The quality of 
a transformation can be expressed according to its 
potency, resilience, and cost. The potency of a 
transformation is related co how obscure P' is in 
relation to P. Any such metric will be relatively 

20 vague as it necessarily depends on human cognitive 

abilities. For the present purposes it is sufficient 
to consider the potency of a transformation as a 
measure of the usefulness of the transformation. The 
resilience of a transf orinauion measures how well a 

25 transformation holds up to an attack from an automatic 
deobf uscator . This is a combination of two factors: 
programmer effort and deobfuscator effort. Resilience 
can be measured on a scale from trivial to one-way. 
One-way transformations are extreme in that they cannot 

30 be reversed. The third component is transformation 

"13- 
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execution cost. This is the execution time or space 
penalty incurred as a result of using the transformed 
' application P' . Further details of transformation 
evaluation are discussed below in the detailed 
5 description of the preferred embodiments. The main 
classification of obfuscating transformations is shown 
in FIG. 2c with details given in FIGs. 2e through 2g. 

Examples of obfuscating transforms are as follows: 
Obfuscating transforms may be categorized as follows: 
10 control obfuscation, data obfuscations , layout 
obfuscations, and preventive obfuscations . Some 
examples of these are discussed below. 

Control obfuscations include aggregation 
transformations, ordering tr-ansformations, and 
15 computation transformations . 

Computation transformations include: concealing 
real control flow behind irrelevant non- functional 
statements; introducing code sequences at the object 
code level for which there exist no corresponding high- 
20 level language constructs; and removing real control 
flow abstractions or introducing spurious ones. 

Considering the first classification (control 
flow) , the Cyclomatic and Nesting complexity metrics 
suggest that there is a strong correlation between the 
25 perceived complexity of a piece of code and the number 
of predicates it contains. Opaque predicates enable 
the construction of transforiTiations which introduce new 
predicates into the program. 

Referring to FIG. 11a, an opaque predicate is 
30 inserted into the basic block S where S = Si...Sn. This 

-14" 
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splits S in half. The p"^ predicate is irrelevant code, 
because it will always evaluate to True. In FIG. lib, 
S is again broken into tv;o halves, which are 
transformed into two different obfuscated versions 
5 and S^. Therefore, it will not be obvious to a reverse 
engineer that and perform the same function. FIG. 
11c is similar to FIG. lib, however, a bug is 
introduced into S^. The P" predicate always selects the 
correct version of the code, S^. 

10 Another type of obfuscation transformation is a 

data transformation. An example of a data 
transformation is deconstructing arrays to increase the 
complexity of code. An array can be split into several 
subarrays, two or more arrays can be merged into a 

15 single array, or the dimensions of an array can be 

increased (flattening) or decreased (folding) . FIG. 24 
illustrates a number of examples of array 
transformations. In statements (1-2), an array A is 
split into two subarrays Al and A2 . Al contains 

20 elements with even indices and A2 contains elements 

with odd indices. Statements (3-4) illustrate how two 
integer arrays B and C can be interleaved to produce an 
array BC. The elements from B and C are evenly spread 
throughout the transformed array. Statements (6-7) 

25 illustrate folding of array D into array Dl . Such 
transformations introduce previously absent data 
structure or remove existing data structure. This can 
greatly increase the obscurity of the program as, for 
example, in declaring a 2 -dimensional array a 

3 0 programmer usually does so for a purpose, with the 
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chosen structure mapping onto the corresponding data. 
If that array is folded into a 1-d structure, a reverse 
engineer would be deprived of valuable pragmatic 
information. 

5 Another example of an obfuscating transformation 

is a preventive transformation. In contrast to control 
or data transformations, the main goal of preventive 
transformations is not to obscure the program to a 
human reader, but to make knovm automatic deobf uscation 

10 techniques more difficult or to exploit known problems 
in current deobf uscators or decompilers . Such 
transformations are known as inherent and targeted, 
respectively. An example of an inherent preventive 
transformation is reordering a for-loop to run 

15 backward. Such reordering is possible if the loop has 
no loop-carried data dependencies. A deobf uscator 
could perform the same analysis and reorder the loop to 
forward execution. However, if a bogus data dependency 
is added to the reversed loop, the identification of 

20 the loop and its reordering would be prevented. 

Further specific examples of obfuscating 
transformations are discussed below in the detailed 
description of the preferred embodiments. 

25 Detailed Description of the Preferred Embodiments 

It has become more and more common to 
distribute software in forms that retain most or 
all of the information present in the original 
source code. An important example is Java 

3 0 bytecode . Because such codes are easy to 
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decompile, they increase the risk of malicious 
reverse engineering attacks . 

Accordingly, several techniques for technical 
protection of software secrets are provided in 
5 accordance with one embodiment of the present 
invention. In the detailed description of the 
preferred embodiments, we will argue that 
automatic code obfuscation is currently the most 
viable method for preventing reverse engineering. 

10 We then describe the design of a code obfuscator, 
an obfuscation tool that converts a program into 
an equivalent one that is more difficult to 
understand and reverse engineer. 

The obfuscator is based on the application of 

15 code transformations, in many cases similar to 
those used by compiler optimizers. We describe a 
large number of such transformations, classify 
them, and evaluate them with respect to their 
potency (e.g., To what degree is a human reader 

20 confused?), resilience (e.g., How well are 

automatic deobfuscation attacks resisted?) , and 
cost (e.g.. How much performance overhead is added 
to the application?) . 

We finally describe various deobfuscation 

25 techniques (such as progx^am slicing) and possible 
countejrmeasures an obfuscator could employ against 
them. 

1 Introduction 

"17" 



SUBSTITUTE SHEET (RULE 26) 



wo 99/01815 



PCT/US98/12017 



Given enough time, effort, and determination, 
a competent programmer will always be able to 
reverse engineer any application. Having gained 
physical access to the application, the reverse 
5 engineer can decompile it (using disassemblers or 
decompilers) and then analyze its data structures 
and control flow. This can either be done manually 
or with the aid of reverse engineering tools, such 
as program slicers. 

10 Reverse engineering is not a new problem. 

Until recently, however, it i:i a problem that has 
received relatively little attention from software 
developers, because most programs are large, 
monolithic, and shipped as stripped, native code, 

15 making them difficult (although never impossible) 
to reverse engineer. 

However, this situation is changing as ic is 
becoming more and more conimon to distribute 
software in forms that are easy to decompile and 

20 reverse engineer. Impor'cant examples include Java 
bytecode and the Architeccure Neutral Distribution 
Format (ANDF) . Java applications in particular 
pose a problem to software developers . They are 
distributed over the Inter^net as Java class files, 

25 a hardware - independent vi:ruual machine code that 
retains virtually all the information of the 
original Java source. Hence, these class files are 
easy to decompile. Moreover, because much of the 
computation takes place in standard libraries. 
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Java programs are often small in size and 
therefore relatively easy to reverse engineer. 

The main concern of Java developers is not 
outright reengineering of entire applications. 
5 There is relatively little value in such behavior, 
because it clearly violates copyright law [29] and 
can be handled through litigation. Rather, 
developers are mostly frightened by the prospect 
of a competitor being able to extract proprietary 

10 algorithms and data structures from their 

applications in order to incorporate them into 
their own programs. Not only does it give the 
competitor a commercial edge (by cutting 
development time and cost) , but it is also 

15 difficult to detect and pursue legally. The 
last point is particularly valid for small 
developers who may ill afford lengthy legal 
battles against powerful corporations [22] with 
unlimited legal budgets. 

20 An overview of various forms of protection 

for providing legal protection or security for 
software is provided in FIG. 2. FIG. 2 provides a 
classification of (a) kinds of protection against 
malicious reverse engineering, (b) the quality of 

25 an obfuscating transformatioii, (c) information 
targeted by an obfuscating transformation; (d) 
layout obfuscations, (e) data obf uscations , (f) 
control obfuscations, and (g) preventive 
obfuscations. 
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The various forms of technical protection of 
intellectual property, which are available to 
software developers are discussed below. We will 
restrict our discussion to Java programs 
5 distributed over the Internet as Java class -files, 
although most of our results v/ill apply to other 
languages and architecture-neutral formats as 
well, as will be apparent to one of ordinary skill 
in the art. We will argue that the only 

10 reasonable approach to the protection of mobile 
code is code obfuscaticn. We will furthermore 
present a number of obfuscating transformations, 
classify them according to effectiveness and 
efficiency, and show how they can be put to use in 

15 an automatic obf uscation tooi . 

The remainder of the detailed description of 
the preferred embodiments is structured as 
follows. In Section 2, v/e give an overview of 
different forms of technical protection against 

20 software theft and argue that: code obf uscation 

currently affords the mos^c economical prevention. 
In Section 3, we give a brief overview of the 
design of Kava, a code obfuscaror for Java, which 
is currently under construction. Sections 4 and 5 

25 describe the criteria we use to classify and 
evaluate different types of obfuscating 
transformations. Sections 6, 7, 8, and 9 present 
a catalogue of obfuscating transformations- In 
Section 10, we give more decailed obf uscation 

30 algorithms. In Section 11, Vve conclude with a 
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summary of our results and a discussion of future 
directions of code obfuscation. 

2 Protecting Intellectual Property 
5 Consider the following scenario. Alice is a 

small software developer who wants to make her 
applications available to users over the Internet, 
presumably at a charge. Bob is a rival developer 
who feels that he could gain a commercial edge 

10 over Alice if he had access to her application's 
key algorithms and data structures. 

This can be seen as a tv/o -player game between 
two adversaries: the software developer (Alice) 
who tries to protect her code from attack, and the 

15 reverse engineer (Bob) whose task it is to analyze 
the application and convert ic into a form that is 
easy to read and understand. Note that it is not 
necessary for Bob to converc the application back 
to something close to Alice's original source; all 

20 that is necessary is that the reverse engineered 
code be understandable by Bob and his programmers. 
Note also that it may noc be necessary for Alice 
to protect her entire application from Bob; it 
probably consists mostly of ''bread-and-butter 

25 code" that is of no real interest to a competitor. 

Alice can protect her code from Bob's attack 
using either legal or technical protection, such 
as shown in FIG. 2a, which is discussed above. 
While copyright law does cover software artifacts, 

30 economic realities make it difficult for a small 
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company like Alice's to enforce the law against a 
larger and more powerful competitor. A more 
attractive solution is for Alice to protect her 
code by making reverse engineering so technically 
5 difficult that it becomes impossible or at the 
very least economically inviable. Some early 
attempts at technical protection are described by 
Gosler. (James R. Goslex . Software protection: 
Myth or reality? In CRYPTO* 35 Advances in 

10 Cryptology, pages 140- -157, August 1985) . 

The most secure approcich is for Alice not to 
sell her application at all, but rather sell its 
services. In other words, users never gain access 
to the application itself but rather connect to 

15 Alice's site to run the program remotely as shown 
in FIG. 3a, paying a small amount of electronic 
money every time. The advantage to Alice is that 
Bob will never gain physical access to the 
application and hence will not be able to reverse 

20 engineer it. The downside is of course that, due 
to limits on network bandwidth and latency, the 
application may perform much worse than if it had 
run locally on the user's sice. A partial 
solution is to break the application into two 

25 parts: a public part that runs locally on the 

user's site, and a private part (that contains the 
algorithms that Alice wants to protect) that is 
run remotely, for example, as shown in FIG. 3b. 
Another approach would be for Alice to 

3 0 encrypt her code before it is sent off to the 
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users, for example, as shown in FIG. 4a. 
Unfortunately, this only works if the entire 
decryption/execution process takes place in 
hardware. Such systems are described in Herzberg 
5 (Amir Herzberg and Shlomit S. Pinter. Public 
protection of software. ACM Transactions on 
Computer Systems, 5 (4 ) : 371- -393 , November 1937.) 
and Wilhelm (Uwe G. Wilhelrn. Cryptographically 
protected objects. 

10 http://lsewww.epfl .chZ-wilhelm/CryPO. html, 1997) . 
If the code is executed in software by a virtual 
machine interpreter (as is rr.ost often the case 
with Java bytecodes) , then it will always be 
possible for Bob to intercept and decompile the 

15 decrypted code. 

The Java™ programming language has gained 
popularity mainly because of its architecture 
neutral bytecode. While this clearly facilitates 
mobile code, it does decrease the performance by 

20 an order of magnitude in comparison to native 
code. Predictably, this hat3 lead to the 
development of just-in-time compilers that 
translate Java bytecodes to native code 
on- the- fly. Alice could make use of such 

25 translators to create native code versions of her 
application for all popular architectures. When 
downloading the application, the user's site would 
have to identify the architecture/operating system 
combination it is running, and the corresponding 

30 version would be transmitted, for example, as 
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shown in FIG. 4b. Only having access to the 
native code will make Bob's task more difficult, 
although not impossible. There is a further 
complication with transmitting 

5 native code. The problem is that un].ike Java 

bytecodes, which are subjected to bytecode 

verification before execution, native codes 

cannot be run with complete security on the user's 
machine. If Alice is a trusted member of the 

10 community, the user may accept her assurances that 
the application does not do anything harmful at 
the user's end. To make ^rare that no one tries to 
contaminate the applicacxon, Alice would have to 
digitally sign the codes as zhey are being 

15 transmitted, to prove to z'ae user that the code 
was the original one wriccen by her. 

The final approach v;e are going to consider 
is code obfuscation, for exaurple, as shown in FIG, 
5. The basic idea is for Alice to run her 

20 application through an obiuscator, a program that 
transforms the application into one that is 
functionally identical to che original but which 
is much more difficult for Bob to understand. It 
is our belief that obfuscation is a viable 

25 technique for protecting sofcware trade secrets 
that has yet to receive the attention uhat it 
deserves . 

Unlike server -side execution, code 
obfuscation can never completely protect: an 

3 0 application from malicious reverse engineering 
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efforts. Given enough time and determination. Bob 
will always be able to disaect Alice's application 
to retrieve its important algox'ithms and data 
structures. To aid this effort. Bob may try to run 
5 the obfuscated code through an automatic 

deobfuscator that attempts to undo the obfuscating 
transformations , 

Hence, the level of security from reverse 
engineering that an obf ascator adds to an 

10 application depends on, for example, (a) the 

sophistication of the transformations employed by 
the obfuscator, (b) the power of the available 
deobfuscation algorithms, and (c) the amount of 
resources (time and space) available to the 

15 deobfuscator. Ideally, u'e v/culd like to mimic the 
situation in current public- key cryptosystems , in 
which there is a dramatic difference in the cost 
of encryption (finding large primes is easy) and 
decryption (factoring large numbers is difficult) . 

20 We will see that there are, in fact, obfuscating 
transformations that can be applied in polynomial 
time but which require exponential time to 
deobfuscate, as discussed below. 

25 3 The Design of a Java Obfuscator 

FIG. 6 shows an architecture of Kava, the 
Java obfuscator. The main input to the tool is a 
set of Java class files and the obfuscation level 
required by the user. The user may optionally 

30 provide files of profiling data, as generated by 
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Java profiling tools. This information can be 
used to guide the obfuscator to make sure that 
frequently executed parts of the application are 
not obfuscated by very expensive transformations. 
5 Input to the tool is a Java application, given as 
a set of Java class files. The user also selects 
the required level of obfuscation (e.g., potency) 
and the maximum execution time/space penalty that 
the obfuscator is allowed co add to the 

10 application (the cost) . Kava reads and parses the 
class files along with any library files 
referenced directly or indirectly. A complete 
inheritance tree is constructed, as well as a 
symbol table giving type information for all 

15 symbols, and control flex graphs for all methods. 
Kava contains a large pool of code 
transformations, which are described below. 
Before these can be applied, however, a 
preprocessing pass collects various types of 

20 information about the application in accordance 
with one embodiment. Some kinds of information 
can be gathered using scandard compiler techniques 
such as inter-procedurai dataflow analysis and 
data dependence analysis, some can be provided by 

25 the user, and some are gathered using specialized 
techniques. Pragmatic analysis, for example, 
analyzes the application to see what sort of 
language constructs and programming idioms it 
contains . 
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The information gathered during the 
preprocessing pass is used to select and apply 
appropriate code transformations. All types of 
language constructs in the application can be the 
5 subject of obfuscation: for example, classes can 
be split or merged, methods can be changed or 
created, new control and data structures can be 
created and original ones ncdified. New constructs 
added to the application can be selected to be as 

10 similar as possible to the ones in the source 

application, based on the pragmatic information 
gathered during the preprocessing pass. 

The transformation process is repeated until 
the required potency has been achieved or the 

15 maximum cost has been exceeded. The output of che 
tool is a new application - - functionally 
equivalent to the original one normally given 
as a set of Java class files. The tool will also 
be able to produce Java source files annotated 

20 with information about which transformations have 
been applied, and how the obfuscated code relates 
to the original source. The annotated source will 
be useful for debugging. 

25 4 Classifying Obfuscating Transformations 

In the remainder of this detailed description 
of the preferred embodiments we will describe, 
classify, and evaluate various obfuscating 
transformations. We start by formalizing the 

30 notion of an obfuscating transformation: 
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Definition 1 (Obfuscating Transformation) 

Let P -'^-> P' be a legal obfuscating transformation 

in which the following conditions must hold: 

5 

- If P fails to terminate or terminates with 
an error condition, then P' may or may not 
terminate . 

10 - Otherwise, P' must te-minate and produce 

the same output as F. 

Observable behavior is defined loosely as 
"behavior as experienced by the user." This means 

15 that P' may have side-effects {such as creating 

files or sending messages over the Internet) that 
P does not, as long as these side effects are not 
experienced by the user. Note that we do not 
require P and P' to be equally efficient. In fact, 

20 many of our transformations v;ill result in P' 
being slower or using more memory than P . 

The main dividing line between different 
classes of obfuscation techniques is shown in FIG. 
2c. We primarily classify an obfuscating 

25 transformation according to the kind of 
information it targets. Some simple 
transformations target the lexical structure (the 
layout) of the application, such as source code 
formatting, and names of variables. In one 

3 0 embodiment, the more sophisticated transformations 
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that we are interested in,, target either the data 
structures used by the application or its flow of 
control . 

Secondly, we classify a transformation 
5 according to the kind of operation it performs on 
the targeted information. As can be seen from 
FIGs. 2d through 2g, there are several 
transformations that manipulate the aggregation of 
control or data. Such transformations typically 

10 break up abstractions created by the programmer, 
or construct new bogus abstractions by bundling 
together unrelated data or control . 

Similarly, some transformations affect the 
ordering of data or control. In many cases the 

15 order in which two items are declared or two 

computations are performed has no effect: on the 
observable behavior of the program. There can, 
however, be much useful information embedded in 
the chosen order, to the programmer who wrote the 

20 program as well as to a reverse engineer. The 
closer two items or events aire in space or time, 
the higher the likelihood that they are related in 
one way or another. Ordering transformations try 
to explore this by randomizing the order of 

25 declarations or computations. 

5 Evaluating Obfuscating Transformations 

Before we can attempt to design any 
obfuscating transformations, we should be able to 
30 evaluate the quality of such a transfoi'mation. In 
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this section we will attempt to classify 
transformations according to several criteria: how 
much obscurity they add to the program (e.g., 
potency) , how difficult they are to break for a 
5 deobfuscator (e.g., resilience), and how much 

computational overhead they add to the obfuscated 
application (e.g., cost). 

5,1 Measures of Potency 

10 We will first define what it means for a 

program P' to be more obscure (or complex or 
unreadable) than a program ?. Imy such metric 
will, by definition, be r^Blatively vague, because 
it must be based (in part) on human cognitive 

15 abilities. 

Fortunately, we can draw upon the vast body 
of work in the Software Complexity Metrics branch 
of Software Engineering. In this field, metrics 
are designed with the intent to aid the 

20 construction of readable, reliable, and 

maintainable software. The iTietrics are frequently 
based on counting various textual properties of 
the source code and combinir^g these counts into a 
measure of complexity. While some of the formulas 

25 that have been proposed have been derived from 
empirical studies of real programs, others have 
been purely speculative. 

The detailed complexj^ty formulas found in the 
metrics* literature can be used to derive general 

3 0 statements, such as: "ii programs P and ?' are 



SUBSTITUTE SHEET (RULE 26) 



wo 99/01815 PCT/US98/12017 

identical except that P' contains more of property 
q than P, then P' is more complex than P." Given 
such a statement, we can attempt to construct a 
transformation that adds more- of the q-property to 
5 a program, knowing that this is likely to increase 
its obscurity. 

FIG. 7 is a table that tabulates some of the 
more popular complexity measures, in which E{x) is 
the complexity of a software component x, F is a 

10 function or method, C is a class, and ? is a 
program. When used in a softv;are construction 
project the typical goal to minimize these 
measures. In contrast, when obfuscating a program 
we generally want to maxiip^ize the measures. 

15 The complexity metrics allow us tc formalize 

the concept of potency and will be used below as a 
measure of the usefulness of a transformation. 
Informally, a transformation is potent if it does 
a good job confusing Bob, by hiding the intent of 

20 Alice's original code. In ether words, the potency 
of a transformation measures how much more 
difficult the obfuscated code is to understand 
(for a human) than the ciiginal code. This is 
formalized in the following definition: 

25 

Definition 2 (Transformation Potency) Let T be a 
behavior- conserving transf oriuation, such that 
P -'^-> P' transforms a source program P into a 
target program P' , Let E(P) be the complexity of 
30 P, as defined by one of uhe metrics of FIG. 7. 
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Tpot(P)/ the potency of T with respect to a 
program P, is a measure of the extent to which T 
changes the complexity of P . It is defined as 

Tpot(P) = E(P-')/E{P)-l. 
T is a potent obfuscating transformation if 
Tpot(P) > 0. 



For the purposes of thir^ discussion, we will 
measure potency on a three -point scale, (lew, 
10 medium, high) . 

The observations in Table 1 make it possible 
for us to list some desirable properties of a 
transformation T. In order for T to be a potent 
obfuscating transformation, it should 
15 - increase ox^erall program size (u-,) and 

introduce new cLasHes and methods (u%) . 

- introduce new predicates (Us) and 
increase the nesting level of 
conditional and looping constructs 

20 (U3) . 

- increase the number of method 
arguments (U5) and inter- class instance 
variable dependencies {u*^7 ) . 

- increase the height of the inheritance 
25 . tree (u^'%) . 

- increase long-range variable 
dependencies (U4) . 
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5.2 Measures of Resilience 

At first glance it would seem that increasing 
Tpot^P ) would be trivial. To increase the 
metric, for example, all we have to do is to add 
5 some arbitrary if -stateraenns to ?: 



main ( ) { 
SI; 

32; ='^=> 



main ( ) { 
SI; 

if (5==2) SI; 
S2;} 

if (1>2) S2; 



Unfortunately, such transformations are 
15 virtually useless / because they can easily be 
undone by simple automatic techniques. It is 
therefore necessary to introduce the concept of 
resilience, which measures how well a 
transformation holds up under attack from an 
20 automatic deobf uscator . For example, the 

resilience of a transf ormaticn T can be seen as 
the combination of two measures: 



Programmer Effort: 'che amount of time 
25 required to construct an automatic 

deobfuscator that is able to effectively 
reduce the potency of T , and 



Deobfuscator Effort: the execution time and 
3 0 space required by such an automatic 

-33" 



SUBSTITUTE SHEET (RULE 26) 



wo 99/01815 



PCT/US98/12017 



deobfuscator to effectively reduce the 
potency of T . 

It is important to distinguish between 
5 resilience and potency. A transformation is potent 
if it manages to confuse a human reader, but it is 
resilient if it confuses an automatic 
deobfuscator . 

We measure resilience on a scale from trivial 

10 to one way, as shown in FIG. 8a. One-way 

transformations are special, in the sense that 
they can never be undone. This is typically 
because they remove information from the program 
that was useful to the human programmer, but which 

15 is not necessary in order to execute the program 
correctly. Examples include transformations that 
remove formatting, and scramble variable names. 

Other transformations typically add useless 
information to the program that does not change 

20 its observable behavior, but which increases the 
"information load" on a huuian reader. These 
transformations can be undone with varying degrees 
of difficulty, 

FIG. 8b shows that deobfuscator effort is 

25 classified as either polynoni:'.ai time or 

exponential time. Programmer effort, the work 
required to automate the deobfuscation of a '■ 
transformation T, is measured as a function of the 
scope of T. This is based on the intuition that 

30 it is easier to construct counter-measures against 
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an obfuscating transformation that only affects a 
small part of a procedure, than against one that 
may affect an entire program. 

The scope of a transf oirination is defined 
5 using terminology borrov/ed from code optimization 
theory: T is a local transsf ormation if it affects 
a single basic block of a control flow graph 
(CFG) , it is global if ir: affects an entire CFG, 
it is inter-procedural if it affects the flow of 
10 information between procedures, and it is an 
interprocess transformation if it affects the 
interaction between independently executing 
threads of control . 



15 Definition 3 (Transformation Resilience) Let T be 
a behavior-conserving transformation, such that 
P ='^=> P' transforms a scarce program P into a 
target program P' . Tres(P) i^' the resilience of T 
with respect to a prograim ?. 

20 Tres(P) is a one-way transformation if 

information is removed from P such that P cannot 
be reconstructed from P** . Otherwise, 

Tres ~ Resilience ( J-Deobfuscator effort' ^Programmer 
effort) ' 

25 in which Resilience is the function defined in the 
matrix in FIG. 8b. 



5.3 Measures of Execution Cost 

In FIG. 2b, we see that potency and 
30 resilience are two of the three components 
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describing the quality of a transformation. The 
third component, the cost of a transformation, is 
the execution time or space penalty that a 
transformation incurs on an obfuscated 
5 application. We classify the cost on a four-point 
scale (free, cheap, costly, dear) , in which each 
point is defined below: 

Definition 5 (Transformation Cost) Let T be a 

10 behavior- conserving trai.isf ormation, such that 

Tcost(^) ^ (dear, costly, cheap, free} with T^^^^iP) 
= free, if executing P' requires 0(1) more 
resources than P; otherwit^e T^ost^P) = cheap, if 
executing P' requires 0(n) more resources than P; 

15 otherwise Tcost(P) = costly, if executing P' 

requires O(n^), with p>l, more resources than P; 
otherwise T^ost^P) = dear (i.e., executing P' 
requires exponentially more resources than P) . 
It should be noted that the actual cost 

20 associated with a transformation depends on the 

environment in which it is applied. For example, a 
simple assignment statement a=5 inserted at the 
top-most level of a program will only incur a 
constant overhead. The same statement inserted 

25 inside an inner loop will have a substantially 
higher cost. Unless noted Cuherwise, we always 
provide the cost of a tran,u format ion as if it had 
been applied at the oucerinosc nesting level of the 
source program. 
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5.4 Measures of Quality 

We can now give a formal definition of the 
quality of an obfuscating transformation: 

5 

Definition 6 (Transformation Quality) 
Tquai(P), the quality of a transformation T, is 
defined as the combination of the potency, 
resilience, and cost of T: T^uai(P)= (Tpot(P)/ 
10 Tres(P) / Tcost(P) ) • 

5.5 Layout Transformations 

Before we explore novel transformations, we 
will briefly consider the trivial layout 

15 transformations, which, for example, are typical 
of current Java obfuscators such as Crema. (Hans 

Peter Van Vliet. Crema The Java obfuscator. 

http : / /web . inter . nl , net /users /H . P . van . 
Vliet/crema.html, January 1996) The first 

20 transformation removes the source code formatting 
information sometimes available in Java class 
files. This is a one-v/ay transformation, because 
once the original formattir.g is gone it cannot be 
recovered; it is a transformation with low 

25 potency, because there is very little semantic 

content in formatting, and no great confusion is 
introduced when that information is removed; 
finally, this is a free tz'ans format ion, because 
the space and time complexity' of the application 

30 is not affected. 
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Scrambling identifier names is also a one-way 
and free transformation. However, it has a much 
higher potency than formatting removal, because 
identifiers contain a great deal of pragmatic 
5 information. 

6 Control Transformations 

In this and the next tew sections we will 
present a catalogue of obfu;icating 

10 transformations. Some have been derived from 
well-known transformations used in other 
areas such as compiler optimization and software 
reengineering, others have been developed for the 
sole purpose of obfuscation, in accordance with 

15 one embodiment of the present invention. 
In this section we will discuss 
transformations that attempc to obscure the 
control-flow of the source application. As 
indicated in FIG. 2f , we classify these 

20 transformations as affeccing the aggregation, 

ordering, or computations of the flow of control. 
Control aggregation transformations break up 
computations that logically belong together or 
merge computations that do noc. Control ordering 

25 transformations randomize the order in which 
computations are carried out. Computation 
transformations can insert new (redundant or dead) 
code, or make algorithmic changes to the source 
application. 
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For transformations that alter the flow of 
control, a certain amount of computational 
overhead will be unavoidable . For Alice this 
means that she may have to choose between a highly 
5 efficient program, and one that is highly 

obfuscated. An obfuscator can assist her in this 
trade-off by allowing her to choose between cheap 
and expensive transformations. 

10 6,1 Opaque Predicates 

The real challenge when designing 
control -altering transf c:: iT:ations is to make them 
not only cheap, but also resistant to attack from 
deobf uscators . To achieve chis, many 

15 transformations rely on the existence of opaque 
variables and opaque predicates. Informally, a 
variable V is opaque if it has some property q 
that is known a priori to the obfuscator, but 
which is difficult for the deobf uscator to deduce. 

20 Similarly, a predicate F (a Boolean expression) is 
opaque if a deobfuscator can deduce its outcome 
only with great difficulty, while this outcome is 
well known to the obfuscator. 

Being able to create opaque variables and 

25 predicates that are difficult for a deobfuscator 
to crack is a major challenge to a creator of 
obfuscation tools, and the key to highly resilient 
control transformations. We measure the resilience 
of an opaque variable or predicate (i.e., its 

30 resistance to deobf uscat ion attacks) on the same 
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scale as transformation resilience (i.e., trivial, 
weak, strong, full, one-way) . Similarly, we 
measure the added cost of an opaque construct on 
the same scale as transformation cost (i.e., 
5 free, cheap, costly, dear) . 

Definition 7 (Opaque Constructs) A variable V is 
opaque at a point p in a progi-am, if V has a 
property q at p, which is known at obfuscation 

10 time. We write this as V^p or if p is clear from 
the context. A predicate P is opaque at p if its 
outcome is known at obfuscation time. We write P^p 
(P^p) if P always evaluates to False (True) at p, 
and P'p if P sometimes evaluates to True and 

15 sometimes to False. Again, p will be omitted if 
clear from the context. FIG. 3 shows different 
types of opaque predicates. Solid lines indicate 
paths that may sometimes be taken, and dashed 
lines indicate paths that '/.ill never be taken. 

20 Below we give some examples of simple cpaq'ue 

constructs. These are easy to construct for the 
obfuscator and equally easy zo crack for the. 
deobfuscator . Section 8 provides examples of 
opaque constructs with much higher resilience. 

25 

6.1,1 Trivial and Weak Opaque Constructs 
An opaque construct is -rivial if a 
deobfuscator can crack it (i.e., deduce its value) 
by a static local analysis. An analysis is local 
30 if it is restricted to a single basic block of a 
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control flow graph. FIGs. 10a and 10b provide 
examples of (a) trivial opaque constructs and (b) 
weak opaque constructs. 

We also consider an opaque variable to be 
5 trivial if it is computed from calls to library 
functions with simple, well -understood semantics. 
For a language like the Java^".. language which 
requires all implementations to support a standard 
set of library classes, such opaque variables are 

10 easy to construct. A simple example is int v^[l,5] 
= random (1,5), in v/hich random (a, b) is a library 
function that returns an iuceger in the range a . 
. , b. Unfortunately, such opaque variables are 
equally easy to deobfuscate. All that is required 

15 is for the deobfuscator -designer to tabulate the 
semantics of all simple library functions, and 
then pattern-match on the function calls in the 
obfuscated code. 

An opaque construct is weak if a deobfuscator 

20 can crack it by a static global analysis. An 

analysis is global if it Is restricted to a single 
control flow graph. 

6.2 Computation Transf o::maticns 

25 Computation Transformations fall into three 

categories: hide the real control-flow behind 
irrelevant statements that do not contribute i:o 
the actual computations, introduce code sequences 
at the object code level for which there exist no 

30 corresponding high-level language constructs, or 
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remove real control- flow abstractions or introduce 
spurious ones . 

6.2.1 Insert Dead or Irrelevant Code 

5 The U2 and U3 metrics suggest that there is a 

strong correlation between the perceived 
complexity of a piece of code and the number of 
predicates it contains. U^ing opaque predicates, 
we can devise transformations that introduce new 

10 predicates in a program. 

Consider the basic block S = . . , Sn in 
FIG. 11. In FIG. lla, we insert an opaque 
predicate P*^ into S, essentisilly splitting it in 
half. The predicate is irrelevant code, because 

15 it will always evaluate to True. In FIG. lib, we 

again break S into two halves, and then proceed to 
create two different obfuscated versions and 
of the second half. i^nd S' will be created by 

applying different sets of oi^/Z uscating 

20 transformations to the second half of S. Hence, it 
will not be directly obvious co a reverse engineer 
that and in fact peiiorrri the same function. 
We use a predicate P' to select between S'^ ana 
at runtime. 

25 FIG. 11c is similar tu FIG. lib, but this 

time we introduce a bug inco S^. The P*^ predicate 
always selects the correct version of the code, S^. 

6.2.2 Extend Loop Conditions 

"42-- 
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FIG. 12 shows how we can obfuscate a loop by 
making the termination condition more complex. The 
basic idea is to extend the loop condition with a 
p"^ or predicate that v/ill not affect the number 
5 of times the loop will execute. The predicate we 
have added in FIG. 12d, for example, will always 
evaluate to True because (x+l) ^=0 (mod4) . 

6.2.3 Convert a Reducible co a Won-Reducible Flow 
10 Graph 

Often, a programming language is compiled to 
a native or virtual machine code, which is more 
expressive than the language itself. When this is 
the case, it allows us to devise language -breaking 

15 transformations. A transformation is 

language -breaking if it in-ci.oduces virtual raachine 
(or native code) instruction sequences that have 
no direct correspondence with any source language 
construct. When faced wi/ch such instruction 

20 sequences a deobfuscatcr will either have to try 
to synthesize an equivalent (but convoluted) 
source language program or give up altogether. 

For example, the Java^'' bytecode has a goto 
instruction, but the Java™ language has no 

25 corresponding goto statement . This irieans that the 
Java'** bytecode can express arbitrary control flow, 
whereas the Java^"* language can only (easily) 
express structured control flow. Technically, we 
say that the control flow graphs produced from 

30 Java™ programs will always be reducible, but the 
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Java™ bytecode can express non-reducible fiov; 
graphs . 

Since expressing non-reducible flow graphs 
becomes very awkward in languages without gotos, 
5 we construct a transf oriuation that converts a 

reducible flow graph to a non-reducible one. This 
can be done by turning a structured loop into a 
loop with multiple headers. For example, in FIG. 
13a, we add an opaque predicate to a while 

10 loop, to make it appear that there is a jump into 
the middle of the loop. In fact, this branch will 
never be taken. 

A Java™ decompiler v/ould have to turn a 
non-reducible flow graph into one which either 

15 duplicates code or which contains extraneous 

Boolean variables. Alternatively, a deobfuscator 
could guess that all non-reducible flow graphs 
have been produced by an cbfuscator, and sitaply 
remove the opaque predicate. To counter this we 

20 can sometimes use the alternative transformation 
shown in FIG 13b. If a deobfuscator blindly 
removes P^, the resulting code will be incorrect. 

In particular, FIGs. 13a. and 13b illustrate a 
transformation for transforming a Reducible flow 

25 graph to a Non-Reducible Flow graph. In FIG. 13a, 
we split the loop body 32 xnto two parts (S'*2 and 
3^2) / insert a bogus juviip to the beginning of 

S*^2- FIG. 13b, we also break SI into two parts, 
S^i and S^i. S^^ is moved inco cne loop and an opaque 

30 predicate P^ ensures that S'\ is always executed 
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before the loop body, A second predicate Q^ensures 
that S*^i is only executed once . 

6.2.4 Remove Library Calls and Programming Idioms 
5 Most programs written in Java rely heavily on 

calls to the standard libraries. Because the 
semantics of the library functions are well known, 
such calls can provide useful clues to a revei-se 
engineer. The problem is exacerbated by the fact 

10 that references to Java library classes are always 
by name, and these names ca^r^nct be obfuscated. 

In many cases the cbfuscator will be able to 
counter this by simply providing its ovm versions 
of the standard libraries. For example, calls to 

15 the Java Dictionary class (v.-hich uses a hash table 
implementation) could be turned into calls to a 
class with identical behavior, but implemented as, 
for example, a red-black tree. The cost of this 
transformation is not sc tauch in execution time, 

20 but in the size of the program. 

A similar problem occurs with cliches (or 
patterns) , common programming idioms that occur 
frequently in many applications. An experienced 
reverse engineer will search for such patterns to 

25 jump-start his understaiidinvj of an unfamiliar 

program. As an example, consider linked lists in 
Java™. The Java^'^ library has no standard class 
that provides common list operations such as 
insert, delete, and enumerate. Instead, mosc Java™ 

30 programmers will construct lisi-s of objects in an 
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ad hoc fashion by linking them together on a next 
field. Iterating through such lists is a very 
common pattern in Java"* programs. Techniques 
invented in the field of automatic program 
5 recognition (see Linda Mary Wills. Automated 

program recognition: a feasibility demonstration. 
Artificial Intelligence, 45 (1--2) :113--172, 1990, 
incorporated herein by reference) can be used to 
identify common patterns and replace them with 
10 less obvious ones. In the linked list case, for 

example, we might represer^t che standard list data 
structure with a less ccimon one, such as cursors 
into an array of elemencs . 

15 6.2.5 Table Interpretation 

One of the most effecuive (and expensive) 
transformations is table interpretation. The idea 
is to convert a section of code (Java byteccde in 
this example) into a differenc virtual machine 

20 code. This new code is then executed by a virtual 
machine interpreter included with the obfuscated 
application. Obviously, a particular application 
can contain several interpreters, each accepting a 
different language and executing a different 

25 section of the obfuscated application. 

Because there is usually an order of 
magnitude slow down for each level of 
interpretation, this transformation should be 
reserved for sections of code that make up a small 
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part of the total runtime or which need a very 
high level of protection. 

6.2.6 Add Redundant Operands 

5 Once we have constructed some opaque 

variables we can use algebraic laws to add 
redundant operands to arithmetic expressions. This 
will increase the u^ metric. Obviously, this 
technique works best with integer expressions 

10 where numerical accuracy is not an issue. In the 
obfuscated statement (1*) helow we make use of an 
opaque variable ? whose value is 1. In statem.ent 
(2*) we construct an opaque subexpression P/Q 
whose value is 2. Obviously, ive can let P and Q 

15 take on different values during the execution of 
the program, as long as their qxiotient is 2 
whenever statement (2*) is reached. 

(1) X=X+V; =^=> (1^) X=X+V*P"^ ; 

20 (2) Z=L+1; 12') Z=L+ ( P^^^/Q"^^''^ ) /2 . 

6.2.7 Parallelize Code 

Automatic parallelization is an important 
compiler optimization used ::o increase the 
25 performance of applications running on 

multi-processor machines. Our reasons for wanting 
to parallelize a program, of course, are 
different. We want to increase parallelism not to 
increase performance, but to obscure the actual 
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flow of control. There are two possible operations 
available to us : 

1 . We can create dummy processes that 
5 perform no useful rask, and 

2. We can split a sequential section of 
the application code into multiple 
sections executing in parallel. 

10 

If the application is running on a 
single-processor machine, we can expect these 
transformations to have a significant execution 
time penalty. This may be acceptable in many 

15 situations, because the resilience of these 
transformations is high: static analysis of 
parallel programs is very difficult, because the 
number of possible execution paths through a 
program grows exponentially with the number of 

20 executing processes. ?aralleiii:ation also yields 
high levels of potency: a reverse engineer v/ill 
find a parallel program much more difficult to 
understand than a sequential one. 

As shown in FIG. 14, a section of code can be 

25 easily parallelized if it concains no data 

dependencies. For example, if Si and are two 
data -independent statements they can be run in 
parallel. In a programming language like the 
Java™ language that has xio explicit parallel 
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constructs, programs can be parallelized using 
calls to thread (lightweight process) libraries. 

As shown in FIG. 15, a section of code that 
contains data dependencies can be split into 
5 concurrent threads by inserting appropriate 
synchronization primitives; such as await and 
advance (see Michael Wolfe. High Performance 
Compilers For Parallel Corapucing. Addison- Wesley , 
1996. ISBN 0-8053-2730-4, incorporated herein by 
10 reference) . Such a program will essentially be 

running sequentially, but the flow of control will 
be shifting from one thread tc the next. 

6.3 Aggregation Transformations 

15 Programmers overcome ':he inherent complexity 

of programming by introducing abstractions. There 
is abstraction on many levels of a program, but 
the procedural abstraction is the most important 
one. For this reason, obscaring procedure and 

20 method calls can be impox^tanc to the obfuscator. 
Below, we will consider several ways in which 
methods and method invocacxcns can be obscured: 
inlining, outlining, interleaving, and cloning. 
The basic idea behind all of these is the same: 

25 (1) code that the programmer aggregated into a 

method (presumably because it logically belonged 
together) should be broken up and scattered over 
the program and (2) code chat seems not to belong 
together should be aggregated into one method. 
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6.3.1 Inline and Outline Methods 

Inlining is, of course, a important compiler 
optimization. It is also an extremely useful 
5 obfuscation transformation, because it removes 

procedural abstractions from the program. Inlining 
is a highly resilient transformation (it is 
essentially one-way) , because once a procedure 
call has been replaced with che body of the called 

10 procedure and the procedure itself has been 

removed, there is no trace of the abstraction left 
in the code. FIG. 16 sho/^o hcv/ procedures P and Q 
are inlined at their call -sites, and then removed 
from the code . 

15 Outlining (turning a ^eequence of statements 

into a subroutine) is a very useful companion 
transformation to inlining. V7e create a bogus 
procedural abstraction by extracting the beginning 
of Q's code and the end oi\ ? *g code into a new 

20 procedure R. 

In object-oriented languages such as the 
Java™ language, inlining may, in fact, not always 
be a fully one-way transf ornacion. Consider a 
method invocation m,P(). The actual procedure 

25 called will depend on the run-time type of m. In 
cases when more than one meuhod can be invoked at 
a particular call site, we inline all possible 
methods (see Jeffr-ey Dean. Whole-Program 
Optimization of Object-Oriented Languages. PhD 

30 thesis, University of Washington, 1996, 
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incorporated herein by reference) and select the 
appropriate code by branching on the type of m) . 
Hence, even after inlining and removal of methods, 
the obfuscated code may still contain some traces 
5 of the original abstractions. For example, FIG. 
17 illustrates inlining method calls. Unless v;e 
can statically determine the type of m, all 
possible methods to which v:i.P{) could be bound 
must be inlined at the call site. 

10 

6.3.2 Interleave Methods 

The detection of interleaved code is an 
important and difficult revzrrze engineering task. 
FIG, 18 shows how we can interleave two 

15 methods declared in the aaivie class. The idea is to 
merge the bodies and paramet(H:r lists of the 
methods and add an extra parameter (or global 
variable) to discriminate between calls to the 
individual methods. Ideally, the methods should be 

20 similar in nature to allov; merging of common code 
and parameters. This is tr.e case in FIG. IB, in 
which the first parameter oc Ml and M2 have 
the same type. 

25 6.3.3 Clone Methods 

When trying to undtf;r£.icand the purpose of a 
subroutine a reverse engineer v;ill of course 
examine its signature and body. However, eq-aally 
important to understanding the behavior of the 

3 0 routine are the different environments in which it 

-51 - 



SUBSTITUTE 3H£u f (RULE 26) 



wo 99/01815 



PCT/US98/12017 



is being called. We can make this process more 
difficult by obfuscating a method's call sites to 
make it appear that different routines are being 
called, when, in fact, this is not the case. 
5 FIG. 19 shows how we can create several 

different versions of a method by applying 
different sets of obfuscating transformations to 
the original code. VJe use method dispatch to 
select between the different versions at runtime. 

Method cloning is s im j_ J. a r to the predicate 
insertion transformations in FIG. 11, except that 
here we are using method di^.patch rather than 
opaque predicates to select between different 
versions of the code . 

6.3.4 Loop Transformations 

A large number of loop transformations have 
been designed with the intent to improve the 
performance of (in part iculai': ) numerical 
applications. See Bacon [2] for a comprehensive 
survey. Some of these transformations are useful 
to us, because they also increase the complexity 
metrics, which are discussed above with respect to 
FIG. 7. Loop Blocking, as i=jhown in FIG. 2 0a, is 
used to improve the cache behavior of a locp by 
breaking up the iteration ypace so that the inner 
loop fits in the cache. Loop unrolling, as shown 
in FIG. 20b, replicates the body of a loop one or 
more times. If the loop bounds are known at 
compile time the loop can be unrolled in its 
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entirety. Loop fission, as shown in FIG. 20c, 
turns a loop with a compound body into several 
loops with the same iteration space. 

All three transformations increase the and 
5 Uj metrics, because they increase the source 
application's total code size and number of 
conditions. The loop blocking transformation also 
introduces extra nesting, ano hence also increases 
the U3 metric, 

10 Applied in isolation, the resilience of these 

transformations is quite lo\*;. It does not require 
much static analysis for a dsobfuscator to reroll 
an unrolled loop. However, when the 
transformations are combined, the resilience rises 

15 dramatically. For example, civen the simple loop 
in FIG. 20b, we could first apply unrolling, then 
fission, and finally blocking. Returning the 
resulting loop to ius origiLval form would require 
a fair amount of analysis for zhe deobf uscator . 

20 

6.4 Ordering Transf onriacions 

Programmers tend to organize their source 
code to maximize its locality. The idea is that a 
program is easier to read acd understand if two 

25 items that are logically related are also 

physically close in the source text. This kind 'of 
locality works on every level of the source: for 
example, there is locality among terms within 
expressions, statements witi;.in basic blocks, basic 

30 blocks within methods, meuhods within classes, and 
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classes within files. All kinds of spatial 
locality can provide useful clues to a i-everse 
engineer. Therefore, v/heneiver possible, we 
randomize the placement of any item in the source 
5 application. For some typet-: of items (methods 
within classes, for example) this is trivial. In 
other cases (such as statements within basic 
blocks) a data dependency analysis (see David F. 
Bacon, Susan L. Graham, and Oliver J. Sharp. 
10 Compiler transformations for high-performance 

computing. ACM Computing i^urveys, 26 (4 ): 345- -420 , 
December 1994. http:// 

www.acm. org/pubs/toc/Absc,racts/03G0-0300/ 
197406.html. and Michael Wolfe. High Performance 

15 Compilers For Parallel Commuting. Addison-Wesley, 
1996. ISBN 0-8053-2730-4, incorporated herein by 
reference) is performed to determine which 
reorderings are technically Vc;;lid, 

These transformations have low potency (they 

20 do not add much obscurity to the program) , but 
their resilience is high, in many cases one-way. 
For example, when the placement of statements 
within a basic block has been randomized, there 
will be no traces of the original order left in 

25 the resulting code. 

Ordering transformations can be particularly 
useful companions to the ''Inline -Outline" 
transformation of Section 6.3.1. The potency of 
that transformation can be enhanced by (1) 

30 inlining several procedure calls in a procedui^e P, 
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(2) randomizing the order of the statenter.ts in P, 
and (3) outlining contiguous sections of P's 
statements. This way, unrelated statements that 
were previously part of several different 
5 procedures are brought together into bogus 
procedural abstractions . 

In certain cases it is also possible to 
reorder loops, for exa-it/ple by running them 
backwards. Such loop rever^sal transformations are 

10 common in high-performance compilers (David F, 
Bacon, Susan L. Graham, and Oliver J. Sharp. 
Compiler transformations for high-performance 
computing. ACM Computing Surveys, 26 (4) : 345- -420 , 
December 1994. http:// 

1 5 www . acm . org / pubs / 1 oc / Abs tract s/ 0360-0300/ 
197406.html . ) . 

7 Data Transformations 

In this section v/e Wx...! discuss 
20 transformations rhat obscure the data structures 
used in the source application. As indicated in 
FIG. 2e, we classify these transformations as 
affecting the storage, encoaing, aggregation, or 
ordering of the data. 

25 

7,1 Storage and Encoding Transformations 

In many cases there is a "natural" way to 
store a particular data item in a program. For 
example, to iterate through the elements of an 
3 0 array we probably would choose to allocate a local 
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integer variable of the appropriate size as the 
iteration variable. Other variable types might be 
possible, but they would be less natural and 
probably less efficient. 
5 Furthermore, there is c.lso often a "natural" 

interpretation of the bit-patterns that a 
particular variable can hold vv'hich is based on the 
type of the variable. For cxaraple, we v;ould 
normally assume that a 16 -bit integer variable 
10 storing the bit-pattern OOOOCOOOOOOOllOO would 
represent the integer value 12. Of course, these 
are mere conventions and otLer interpretations are 
possible . 

Obfuscating storage transformations attempt 
15 to choose unnatural storage; classes for dynamic as 
well as static data. Sirailarly, encoding 
transformations attempt to choose unnatural 
encodings for common data cypes . Storage and 
encoding transf onnations cf^^ea go hand- in -hand, 
20 but they can sometimes be used in isolation. 

7.1.1 Change Encoding 

As a simple example of an encoding 
transformation we will replace an integer variable 
25 i by io = Ci * i + C2 , where and C2 are 

constants. For efficiency, we could choose to be 
a power of two. In the example below, we let Ci = 8 
and C2 = 3 : 
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{ 



int i=l; int i=ll; 

while (i < 1000) while (i < 8003) 

. . . A[i] . . .; , . A[(i-3)/8] 

i++; i+=8; 

} } 



Obviously, overflow (and, in case of floating 
point variables, accuracy) issues need to be 

10 addressed. We could either determine that because 
of the range of the variable (the range can be 
determined using static analysis techniques or by 
querying the user) in quesition no overflow will 
occur, or we could change to a larger variable 

15 type . 

There will be a trade-off between resilience 
and potency on one hand, and cost on the other. A 
simple encoding function such as Iq = -f i + 03 
in the example above, v;ill add little extra 

20 execution time but can be deobfuscated using 
common compiler analysis techniques (Michael 
Wolfe. High Performance Compilers For Parallel 
Computing. Addison-Wesiey , 1996. ISBN 
0-8053-2730-4. and David F. Bacon, Susan L. 

25 Graham, and Oliver J. Sharp. Compiler 

transformations for high-performance computing. 
ACM Computing Surveys, 26* (4) : 345--426 , DecetrJoer 
1994. http:// 

www.acm.Org/pubs/toc/Absi:racts/0360-0300/ 
30 197406.html). 
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7.1.2 Promote Variables 

There are a number of simple storage 
transformations that promote variables from, a 
5 specialized storage class to a more general class. 
Their potency and resilience are generally low, 
but used in conjunction with other transformations 
they can be quite effective. For example, in Java, 
an integer variable can be promoted to an integer 
10 object. The same is true of the other scalar types 
which all have corresponding '"-packaged" classes. 
Because Java™ supports garbage collection, the 
objects will be automatically removed when they 
are no longer referenced. Here is an example: 

15 

{ { 

int 1=1; int i = new intd); 

while (i < 9) =^==> while (Lvalue < 9) 

. . .A[i] , . . ; . . . A [i .value] . . . ; 

.20 i++; i.value++; 

} } 

It is also possible to change the lifetime of 
a variable. The simplest :-;uch transform turns a 

25 local variable into a globc.l one which is then 

shared between independent procedure invocations. 
For example, if procedures P and Q both reference 
a local integer variable, and P and Q cannot both 
be active at the same time (unless the program 

30 contains threads, this can be determining by 
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examining the static call graph) then the variable 
can be made global and shared between them: 

void P{) { int C; 

5 int i ; . . . I . . . void P ( ) { 

} ...C... 

} 

void Q()( ='^=> while {i.value<9) 

int k ; . . . k . . . . . . C . . . 

10 } } 

This transformation increases the metric, 
because the number of global data structures 
referenced by P and Q is increased. 

15 

7.1.3 Split Variables 

Boolean variables and other variables of 
restricted range can be split into two or more 
variables. We v/ill write a vax'iable V split into k 
20 variables p^, . . . , p^^. as V ^ tPi, • ■ * f PkJ - 

Typically, the potency of zhis transformation will 
grow with k. Unfortunately, so will the cost of 
the transformation, so we usually restrict k to 2 
or 3. 

25 To allow a variable V of type T to be split 

into two variables p and q of t-ype U requires us 
to provide three pieces of information: (1) a 
function f (p; q) that maps 'che values of p and q 
into the corresponding value of V, (2) a function 

30 g(V) that maps the value of V into the 
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corresponding values of p and q, and (3) new 
operations (corresponding to the primitive 
operations on values of type T) cast in terms of 
operations on p and q. In the remainder of this 
5 section we will assume that V is of type Boolean, 
and p and q are small integer variables. 

FIG. 21a shows a possible choice of 
representation for splxt Boolean variables. The 
table indicates that if V has been split into p 

10 and q, and if, at some pciyic in the program, p = q 
= Oorp = q=l, then tha'c corresponds to V being 
False. Similarly, p = 0., q =^ 1 or p = 1, q = 0 
corresponds to True . 

Given this new representation, we have to 

15 devise substitutions for various built-in Boolean 
operations (e.g., Sc, or). One approach is to 
provide a run -time lockup table for each operator. 
Tables for "AND" and ''OR'' are shov/n in FIGs. 21c 
and 2 Id, respectively. Given tv/o Boolean variables 

20 Vi = [p, q] and = [r, sj , & V2 is computed as 
AND [2p + q, 2r + s] . 

In FIG. 21e, we show the result of splitting 
three Boolean variables A^^Lal,a2], B=[bl,b2], and 
C=[cl,c2]. An interesting aspect of our chosen 

25 representation is that there are several possible 
ways to compute the same BoGle^m expression. 
Statements (3') and (4') in FIG. 21e, for example, 
look different, although they both assign False to 
a variable. Similarly, while statements (5') and 
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iS') are completely different, they both compute A 
& B. 

The potency, resilience, and cost of this 
transformation all grow with the number of 
5 variables into which the original variable is 

split. The resilience can be further enhanced by 
selecting the encoding at run- time. In other 
words, the run- time look-up tables of FIGs. 21b 
through 21d are not constructed at compile-time 

10 (which would make them susceptible to static 
analyses) but by algorithms included in the 
obfuscated application. Thifj. of course, would 
prevent us from using in-line code to compute 
primitive operations, as done in statement in 

15 FIG. 21e. 

7.1.4 Convert Static to Procedural Data 

Static data, particularly character strings, 
contain much useful pragmat.ic information to a 

20 reverse engineer. A technique for obfuscating a 

static string is to convert it into a program that 
produces the string. The pzogram which could be 
a DFA or a Trie traversal could possibly 
produce other strings as we_l. 

25 As an example, considea: a function G of FIG. 

22, which is constructed to obfuscate the strings 
' 'AAA' ' , • 'BAAAA' ' , and ' = CCB ' . The values 
produced by G are G(l)-' ^AAA' ' , G(2)-' 'BAAAA' ' , 
G{3)=G(5)-' 'CCB' • , and G(4;-''XCB'' (which is not 
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actually used in the program) . For other argument 
values, G may or may not teirminate. 

Aggregating the computation of all static 
string data into just one function is, of course, 
5 highly undesirable. Much higher potency and 
resilience is achieved if the G- function was 
broken up into smaller components that were 
embedded into the ""normal'* control flow of rhe 
source program. 

10 It is interesting to note that we can combine 

this technique with the table interpretation 
transformation of Section 6". 2. 5. The intent of 
that obfuscation is to convert a section of Java 
bytecode into code for another virtual machine. 

15 The new code will typically be stored as static 
string data in the obfuscated program. For even 
higher levels of potency and resilience, hovvever, 
the strings could be converted to programs that 
produce them, as discussed above. 

20 

7.2 Aggregation Transformations 

In contrast to imperative and functional 
languages, object-oriented languages are more 
data-oriented than control -oriented. In other 

25 words, in an object -orienued program, the control 
is organized around the data structures, rather 
than the other way around. This means that an 
important part of reverse-- engineering an 
object-oriented application xs trying to restore 

30 the program's data structures. Conversely, iLt is 
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important for an obfuscator to try to hide these 
data structures. 

In most object -orienced languages, there are 
just two ways to aggregate data: in arrays and in 
5 objects. In the next three sections we will 

examine ways in which these data structures can be 
obfuscated. 

7.2.1 Merge Scalar Variables 

10 Two or more scalar variables . . . Vj^ can 

be merged into one variable Vj^ , provided the 
combined ranges of V-^ . . Vj, will fit within the 
precision of V„. For exaaipie, two 32 -bit integer 
variables could be merged into one 64 -bit 

15 variable. Arithmetic on the individual variables 
would be transformed into arithmetic on V„. As a 
simple example, consider merging two 32 -bit 
integer variables X and Y into a 64 -bit variable 
Z. Using the merging fornuila, 

20 

Z{X, Y) = 2 * Y + X 

we get the arithmetic identities in FIG. 23a. Some 
simple examples are given ia FIG. 23b. 

25 In particular; FIG. 23 shows merging two 

32 -bit variables X and Y into one 64 -bit variable 
Z. Y occupies the top 3 2 bits of Z, X the bottom 
32 bits. If the actual range of either X or Y can 
be deduced from the program, less intuitive merges 

30 could be used. FIG. 23a gives rules for addition 
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and multiplication with X and Y. FIG. 23b shows 
some simple examples. The example could be 
further obfuscated, for example by merging (2') 
and (3') into Z+-472446402G1 . 
5 The resilience of variable merging is quite 

low. A deobfuscator only needs to examine the set 
of arithmetic operations being applied to a 
particular variable in order to guess tha*c it 
actually consists of two ir.erged variables. V/e can 

10 increase the resilience by introducing bogus 
operations that could not correspond to any 
reasonable operations on the individual variables. 
In the example in F'IG. 23b, we could insert 
operations that appear to laerge S*s two halves, 

15 for example, by rotation; if (P^) Z = rotate{Z,5). 

A variant of this transformation is to merge 
Vi . . . Vj,. into an array 

V;, = i . . . k 

20 V, . . . V^, 

of the appropriate type. If V^j . . . Vj^ are object 
reference variables, for exarapie, then the element 
type of VA can be any class that is higher in the 
25 inheritance hierarchy than any of the types of 
Vi . . . V,. 

7.2.2 Restructure Arrays 

A number of transformations can be devised 
30 for obscuring operations pc^rfcrmed on arrays: for 
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example, we can split an array into several 
sub- arrays, merge two or more arrays into one 
array, fold an array (increasing the number of 
dimensions) , or flatten an array (decreasing the 
5 number of dimensions) . 

FIG. 24 shows some examples of array 
restructuring. In statements (1-2) an array A is 
split- up into two sub-arrays Al and A2 . Al holds 
the elements of A that have even indices, and A2 

10 holds the elemencs with odd indices. 

Statements (3-4) of ?IG. 24 show how two 
integer arrays 3 and C can be interleaved into a 
resulting array BC. The elements from B and C are 
evenly spread over the resulting array. 

15 Statements (6-7) derac.ns'::rate how a 

one-dimensional array D ca/i be folded into a two- 
dimensional array Dl. Star-ements (8-9), finally, 
demonstrate the reverse transformation: a uuo- 
dimensional array E is flatcened into a 

20 one-dimensional array El. 

Array splitting and folding increase the Ug 
data complexity metric. Ar::ay merging and 
flattening, on the other hana, seem to decrease 
this measure. Vfnile this may seem to indicate that 

25 these transformations have only marginal or even 

negative potency, this, in fact, is deceptive. The 
problem is that the complexity metrics of FIG. 7 
fail to capture an importaxit cispect of some data 
structure transformations : '^hey introduce 

30 structure where there was cjjriginally none or they 
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remove structure from the original program. This 
can greatly increase the obscurity of the program. 
For example, a programmer who declares a 
two-dimensional array does so for a purpose: the 
5 chosen structure somehow raaps cleanly to the data 
that is being manipulated. If that array is folded 
into a one-dimensional structure, a reverse 
engineer will have been deprived of mucli valuable 
pragmatic information . 

10 

7.2.3 Modify Inheritance Pelaticns 

In current object -ori&nted language such as 
the Java^*^ language, the n;ain modularization and 
abstraction concept is the, clc:.ss. Classes are 

15 essentially abstract data types that encapsulate 
data (instance variables) and control (methods) . 
We write a class as C = (V, M) , where V is the set 
of C*s instance variables ar.d M its methods. 

In contrast to the trdci'cional notion of 

20 abstract data types, two clasn^es and can be 
composed by aggregation (C^ na3 an instance 
variable of type C^) as v/ell as by inheritance (C2 
extends by adding nev; methods and ins-cance 
variables) . Vi^e write inheri-cance as C2 = Ci U C'2. 

25 C2 is said to inhex'it fr^m its super- or parent 
class. The U operator is the ::unction that 
combines the parent class with the new propex'ties 
defined in Cj- The exact seinaxitics of U depends on 
the particular programming language. In languages 

30 such as Java, U is usually incerpi'eted as union 
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when applied to the instance variables and as 
overriding when applied r.o methods. 

According to metric u-;, the complexity of a 
class Ci grows with its depth (distance from the 
5 root) in the inheritance liierarchy and the number 
of its direct descendants. For example, there are 
two basic ways in which we can increase this 
complexity: we can split up (factor) a class as 
shown in FIG. 25a or insert a nev\?, bogus, class as 
10 shown in FIG. 25b. 

A problem with class factoring is its low 
resilience; there is nothing stepping a 
deobfuscator from simply merging the factored 
classes. Tc prevent this, factoring and insertion 
15 are normally combined as shown in FIG. 25d. 

Another way of increasing the resilience of these 
types of transformations is to make sure that new 
objects are created of all introduced classes. 

FIG. 25c shows a variant of class insertion, 
20 called false refactoring. Refcictoring is a 

(sometimes automatic) technique for restructuring 
object-oriented programs whose structure has 
deteriorated (see Wiiliari F. Opdyke and Ralph E. 
Johnson. Creating abstract superclasses by 
25 refactoring. In Stan C. Kv/aijny and John F. Buck, 

editors. Proceedings of the 21st Annual Conference 
on Computer Science, pages 6G---73, New York, NY, 
USA, February 1993. ACM Press. 

ftp : / /st . cs ,uiuc . edu/pub/papers/ref actcring/ref act 
30 oring-superclasses.ps, incorporated herein by 
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reference). Refactoring is a tv/o~step process- 
First, it is detected that two, apparently 
independent classes, in fact implement similar 
behavior. Secondly, features common to both 
5 classes are moved into a new (possibly abstract) 
parent class. False refactoring is a similar 
operation, only it is performed on two classes 
and C2 that have no commori behavior. If both 
classes have instance varj.a:?.les of the 
10 same type, these can be rncved into the new parent 
class C3 . C3 ' s methods can be buggy versions of 
some of the methods fron-; and C2. 

7.3. Ordering Transf ormat ion^:? 

15 In Section 6.4 we showed that (when possible) 

randomizing the ox^der in v.'hich computations are 
performed is a useful obfuscation. Similarly, it 
is useful to randomize the; order of declarations 
in the source application, 

20 Particularly, we randomize the order of 

methods and instance variables within classes and 
formal parameters within iLtLchods. In the lacter 
case, the corresponding actuals will of course 
have to be reordered as well. The potency of these 

25 transformations is low and the resilience is 
one-way. 

In many cases it will also be possible to 
reorder the elements within an array. Simply put, 
we provide an opaque encoding function f (i) which 
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maps the i : th element in tha original array into 
its new position of the reordered array: 

{ ( 
5 int i=l, A[1000]; int i=l, A[1000]; 

while (i < 1000) ='^=> while {i < 1000) 

, . .A[i] . . . ; . . .A[f (i)] . . . ; 

i++; i++; 

} ) 

10 

8 Opaque Values and Predicates 

As we have seen, opaque predicates are the 
major building block in the design of 
transformations that obfuscate control flov/. In 

15 fact, the quality of most control transformations 
is directly dependent on the quality of such 
predicates . 

In Section 6 . 1 we gave examples of simple 
opaque predicates v/ith trivial and weak 

20 resilience. This means that che opaque predicates 
can be broken (an automatic deobfuscator could 
determine their value) using local or global 
static analysis. Obviously, we generally require a 
much higher resistance to attack. Ideally, we 

25 would like to be able to constxuct opaque 

predicates that require wcrst case exponential 
time (in the size of the program) to break but 
only polynomial time to construct. In this section 
we will present two such techniques. The first one 
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is based on aliasing and the second is based on 
lightweight processes. 

8-1 Opaque Constructs Using Objects and Aliases 
5 Inter-procedural static analysis is 

significantly complicated whenever there is a 
possibility of aliasing. In fact, precise, 
flow-sensitive alias analysis is undecidable in 
languages with dynamic allocation, loops, and 
10 if -statements . 

In this section we v;ill exploit the 
difficulty of alias analysis to construct opaque 
predicates that are cheap and resilient to 
automatic deobf uscaticn attacks. 

15 

8.2 Opaque Constructs Using Threads 

Parallel programs are more difficult to 
analyze statically than their sequential 
counterparts. The reason is their interleaving 

20 semantics: n statements in a parallel region PAR 
Si, S2, . . . , Sn, ENDPAI^ can be executed in n! 
different ways. In spite of this, some 
static analyses over parallel programs can be 
performed in polynomial time [13] , while others 

25 require all nl interleavmcB to be considered. 
In Java, parallel regions are constructed 
using lightweight processes known as threads. Java 
threads have (from our poinc of view) two very 
useful properties: (1) their scheduling policy is 

30 not specified strictly by the language 
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specification and will hence depend on the 
implementation, and (2) the actual scheduling of a 
thread will depend on asynchronous events, such as 
generated by user interaction, and network 
5 traffic. Combined with the inheirent interleaving 
semantics of parallel regions, this means that 
threads are very difficult to analyze statically. 

We will use these observations to create 
opaque predicates (see FIG. 32) that will require 

10 worst-case exponential time tc break. The basic 
idea is very similar to the one used in Section 
8.2: a global data structure V is created and 
occasionally updated, but kept in a S'Ccite such 
that opaque queries can be made. The difference is 

15 that V is updated by concurrently executing 
threads . 

Obviously, V can be a dynaifiic data structure 
such as the one created in FIG. 26, The threcids 
would randomly move the global pointers g and h 

20 around in their respective components, by 

asynchronously executing calls to move and insert. 
This has the advantage of comhining data races 
with interleaving and aliasing effects, for very 
high degrees of resilience. 

25 In FIG. 27, we illustrate these ideas with a 

much simpler example where V is a pair of global 
integer variables X and Y. It is based on the 
well-known fact from elementary nuvaber theory 
that, for any integers x and y, 7y'' -1 does not 

30 equal x^. 
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9 Deobfuscation and Preventive Transformations 

Many of our obfuscating transformations 
(particularly the control transformations of 
5 Section 6.2) can be said to embed a bogus program 
within a real program. In other words, an 
obfuscated application really consists of two 
programs merged into one: a real program which 
performs a useful task and a bog-us program vmich 

10 computes useless information. The sole purpose of 
the bogus program is to confuse potential reverse 
engineers by hiding the real program bfihind 
irrelevant code . 

The opaque predica::.xi is the main device the 

15 obfuscator has at its disposal no prevent the 

bogus inner program fron being easily identified 
and removed. For exampi:-\ i.n FIG. 28a, an 
obfuscator embeds bogus coda protected by opaque 
predicates within three statements of a real 

20 program. A decbfuscator ' s tasK is to examine the 
obfuscated application and aucoma-cically identify 
and remove the inner bogas program. To accomplish 
this, the deobfuscator muse first identify and 
then evaluate opaque construccs. This process is 

25 illustrated in FIGs.28b through 23d. 

FIG. 2 9 shows the anatonr/ of a semi-automatic 
deobfuscation tool. It incorporates a number of 
techniques that are well knova'i in the reverse 
engineering community. In tha remainder of this 

3 0 section we will briefly reviev/ some of these 
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techniques and discuss various counter-measures 
(so called preventive transformations) that an 
obfuscator can employ to make deobf uscation more 
difficult. 

5 

9.1 Preventive Transformations 

Preventive transformations, which are 
discussed above with respec" to FIG. 2g, are quite 
different in flavor from ccntrol or data 

10 transformations. In contraist to these, their main 
goal is not to obscure the program to a human 
reader. Rather, they are designevd to make known 
automatic deobf uscation techniques more difficult 
(inherent preventive tranaf ormations) , or to 

15 explore known problems in cvrxexrc deobf uscators or 
decompilers (targeted pi'evoLi-_ti-.,'e transformations) . 

9.1.1 Inherent Preventive Transformations 

Inherent preventive transformations will 

20 generally have low potency and high resilience. 
Most importantly, they wil^ har/C the ability to 
boost the resilience of otl:er transformations. As 
an example, assume thac vve have reordered a 
for-loop to run backwards, as suggested in section 

25 G,4. We were able to apply this "cransf oriuation 

only because we could deteru.ine that the loop had 
no loop-carried data dependencies., Naturally, 
there is nothing stopping a deobf uscator from 
performing the same analysis and then returning 

30 the loop CO forward execution. To prevent this, we 
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can add a bogus data dependency to the reversed 
loop: 



{ 

5 for ( i=l ; i<=10 ; 1++) =T= 
A[i] =i 

} 



int B [50] ; 
for (i=10;i<=l;i--) 
-i; 

B fi] +=B [i^*^i/2] 



10 

The resilience this inherent preventive 
transformation adds to the loop reordering 
transformation depends on the complexity of the 
bogus dependency and the state-of-the-art in 
15 dependency analysis [3 6] . 



9.1.2 Targeted Preventive Transformations 

As an example of a targeted preventive 
transformation, consider the HoseMocha program 

2 0 (Mark D . LaDue . KoseMocha . ht tp : //v/ww . xynyx . 

demon.nl/java/HoseMccha.java, January 1997). It 
was designed specifically to explore a weakness in 

the Mocha (Hans Peter Van Vliet- Mocha The 

Java decompiler, http: //web, inter.nl . net/users/H . 

25 P. van. Vliet /mocha. html , January 1996) decompiler. 
HoseMocha inserts extra inscrucrions after every 
return- statement in every method in the source 
program. This transformation has no effect on the 
behavior of the application, but it is enough to 

3 0 make Mocha crash. 
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9.2 Identifying and Evaluating Opaque Constructs 
The most difficult part of deobf uscation is 
identifying and evaluating opaque constructs. Note 
5 that identification and evaluation are distinct 
activities. An opaque constrtict can be local 
(contained within a single basic block) , global 
(contained within a single procedure) , or 
inter-procedural (distributed throughout the 

10 entire program) . For example, if (x * x (7^ * y 
* y -1) ) is a local opaque p:scicace, whereas 
R=X*X; . . . ;S-7*y*y-l; . . . ; it (R ===3^). . .is 
global. If the computation of R rind S were 
performed in different procedures, the cor:Scruct 

15 would be inter-procedural. Obviously, 

identification of a local opaque predicate is 
easier than identif icatic:i of an inter-procedural 
one. 

20 9.3 Identification by Pafcerr. Hatching 

A deobfuscator can use kr^owledge of the 
strategies employed by know:.i obfuscators to 
identify opaque predicates. A designer of a 
deobfuscator could examine an obfuscator (either 

25 by decompiling it or simply by examining the 
obfuscated code it generates) and. construct 
pattern-matching rules that can identify commonly 
used opaque predicates. This mechod will work best 
for simple local predicates, such as x * x (7 * 

30 y*y-l) or random (1^,5) < 0 
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To thwart attempts at pattern matching, the 
obfuscator should avoid using canned opaque 
constructs. It is also important to choose opaque 
constructs that are syntactd.cally similar to the 
5 constructs used in the real application. 

9.4 Identification by Program Slicing 

A programmer will generally find the 
obfuscated version of a prograiri more difficult to 

10 understand and reverse engineer than the original 
one. The main reasons are thau in the obfuscated 
program (a) live "''real" code v;ill be interspersed 
with dead bogus code, and (b) logically related 
pieces of code will have been broken up and 

15 dispersed over the prograir. , Program slicing tools 
can be used by a reverse engineer to counter these 
obfuscaticns . Such tools can interactively aid the 
engineer to decompose a program in'co manageable 
chunks called slices. A slice of a program ? with 

20 respect to a point p and a variable v consists of 
all the statements of P that could have 
contributed to v*s value at p. Hence, a program 
slicer would be able to exciact from the 
obfuscated program the statements of the algorithm 

25 that computes an opaque variable v, even if the 

obfuscator has dispersed these statements over the 
entire program. 

There are several strategies available to an 
obfuscator to make slicing a less useful 

30 identification tool: Add parameter aliases A 
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parameter alias is two formal pa^rameters (or a 
formal parameter and a global variable) that refer 
to the same memory location. The cost of precise 
inter-procedural slicing grov;e i'^ith the nu*fnber of 
5 potential aliases in a program, which in turn 
grows exponentially with the number of formal 
parameters. Hence, if the obfuscator adds aliased 
dummy parameters to a progre^m it v»'ill either 
substantially slow down the slicer (if precise 
10 slices are required) , cr force ohe slicer to 
produce imprecise slices (if fast slicing is 
required) . 

Add variable dependencies, as popular slicing 
tools such as Unravel (James R. Lyle, Dolorres R. 

15 Wallace, James R. Graham, Keith B. Gallagher, 

Joseph P. Poole, and David I'J Binkley. Unravel: A 
CASE tool to assist evaluacion of high integrity 
software. Volume 1: Requirerr-'ent;^ and design. 
Technical Report NIS- 

20 TIR 5691, U.S. Department of Commerce, August 
1995) work well for small slices, but will 
sometimes require excessive time to compute larger 
ones. For example, when vvorking on a 4 00C line C 
program, Unravel in some cases required over 3 0 

25 minutes to compute a slice. Tc i-orce this 
behavior, the obfuscator should attempt to 
increase slice sizes, by adding oogus variable 
dependencies. In the example below, we have 
increased the size of the slice computing 
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X by adding two statement in which apparently 
contribute to x's value, but v/hich, in fact, do 
not . 



5 mainO { =^=> mainO { 

int x=l; int x=l; 

X = X * 3; if (P^ ) X++; 

} X - X - VO ; 

X - X -fr 3; 

10 ) 



9.5 Statistical Analysis 

A deobfuscator can instruraent an obfuscated 
program to analyze the outcome of all predicates. 

15 We will call any deobf uscatiovi method that 
examines the run-time charaat::ristics of an 
obfuscated application in this way, Statistical 
Analysis. The deobfuscator would alert the reverse 
engineer to any predicace that always returns the 

20 same truth value over a large number of tesc runs, 
because they may turn cut co be an opaque P*^ (?^ ) 
predicate. The deobfuscator couid not blindly 
replace such predicates v/ith True (False) , because 
this would be too dangerous. Many applications 

25 will contain ^^real'' predicates that check for 
conditions that only happen tinder exceptional 
circumstances, and to the deobfuscator they will 
appear to behave identically/ zo an opaque 
predicate. As an example, consider pif (Leap Year) 

30 .... 
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Statistical analysis can also be used for 
evaluation. When a potential opaque predicate 
(e.g., P*^) in a program M has been identified, we 
guess its value (True) , and make a version M' of 
5 the obfuscated program where the opaque predicate 
has been replaced by the guessed value. We then 
run M and M' in parallel on t?ie same input, and 
compare to see that they produce identical output. 
If the outputs are the same, we can conclude that 

10 the predicate was part of the bogus, not the real, 
application, as shov/n in P'lG, 30. 

We have to make sure that our chosen inputs 
adequately cover ail paths in the program. Again, 
if the program contains pa'chs that are rarely 

15 taken (if (Leap Year) . . .) this will be 

difficult. Furthermore, generating large numbers 
of correct input/output dar.a is very difficult, 
particularly when int-emai structure of the 
application is unknown, or the input is encered 

20 (as is often the case v.^ith Java programs) through 
a complex graphical user interface. 

To prevent identification by statistical 
analysis, the obfuscator may choose to favor 
transformations thac insert predicates (such as 

25 the one shown in FIG. lib) over chose that insert 
P*^ or predicates. 

Another possible counter- n.easure against 
statistical analysis is to design opaque 
predicates in such a v;ay thac several predicates 

3 0 "have to be cracked at the £>ame ciiue. One v/ay of 
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doing this is to let the. opaque predicates have 
side-effects. In the exann.ple be-low the obfuscator 
has determined (through some sort of static flow 
analysis) that statements S-, and 83 must always 
5 execute the same number of tiT.es. The statements 
are obfuscated by introducing opaque predicates 
which are calls to functions and Qi- and Q2 
increment and decrement a glc.'bal variable k: 

10 { =^=> { 

Si; in'C k^^O; 

S2; bool Q-^ i^] { 

} k+-2'' ; return {p'^^ ) } 

bocl Q2 (x) [ 
15 k--2'" ; return (P^2 )/ 

{ 

if (Qi (j) T ) Si ; 

if ;Q, (k) T ) S2 ; 
20 } 

If the deobfuscator tries co replace one (but 
not both) predicates with True^ k will overflow. 
As a result, the deobfuscated program v/ill 
25 terminate with an error condition. 

9.6 Evaluation by Data-Flow Analysis 

Deobfuscation is similar to many types of 
code optimization. Removing if (False) ... is 
30 dead code eliminacion and moving identical code 
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from if -statement branches (e.g., S^^ and S^^ in 
FIG. 28) is code hoisting, both common code 
optimization techniques. 

When an opaque construct has been identified 
5 we can attempt to evaluate it. In simple cases 
constant propagation using a reaching definition 
data-flow analysis can be sufficient: x=5; . . 
.;y=7;. . .;if (x*x-- ( 7*7-7-1 ) ) .... 

10 9.7 Evaluation by Theorem Proving 

If data-flow analysis is not powerful enough 
to break the opaque predicate, a deobfuscator can 
attempt to use a theorem prover. Whether zhis is 
doable or not depends on the power of 

15 state-of-the-art theorem prove?:s (which is 

difficult to ascertain) and the complexity of the 
theorem that needs to be prov^en. Certainly, 
theorems that can be proved by induction (such as 
x^(x + 1)^ 0 (mod 4/), c"ire well within reach of 

20 current theorem prover s . 

To make things more difficult, we can use 
theorems which are know to be difficult to prove, 
or for v;hich no known proof exists. In the example 
below the deobfuscacor v/ili ha'/e to prove that the 

25 bogus loop always terminates in cx'der to determine 
that S2 is live code: 

{ { 

Si ; ^1/ 
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S 



ri = random ( 1 , 2"^^ ) ; 



) 



do 



n - { (n%2) 1=0) ?3*n+l:n/2 



while (n>l) ; 




This is known as the Collati: problem. A conjecture 
says that the loop will always terminate. Although 
there is no known proof of this conjecture, the 
code is known to tenviinate for all numbers up to 
7*10^^. Thus, this obfuscaticn. is safe (the 
original and obfuscated code behave identically) , 
but is difficult to deobfuscate. 

9.8 Deobfuscation and Partial Evaluation 

Deobf uscation also reseir.bles partial 
evaluation. A partial evaluacor splits a program 
into two parts: the static part which can be 
precomputed by the partial evaluator, and the 
dynamic part which is executed at runtime. The 
dynamic part would correspond to our original, 
unobfuscated, program. The static part would 
correspond to our bogus inner program, which, if 
it were identified, could be evaluated and removed 
at deobfuscation time. 

Like all other static inter-procedural 
analysis methods, partial evaluation is sensitive 
to aliasing. Hence, the same preventive 
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transformations that were discussed in relation to 
slicing also applies to partial evaluation, 

10 Obfuscation Algorithms 
5 Given the obfuscator architecture of Section 

3, the definition of obfuscation quality in 
Section 5, and the discussion of various 
obfuscating transformations in Section 6 through 
Section 9, we are now in a position to present 
10 more detailed algorithms, in accordance with one 
embodSiment of the present invention. 

The top-level loop of an obfuscation tool can 
have this general struct:ure; 

15 WHILE NOT Done (A) DO 

S : = SelectCcde (A) ; 

T SelectTransf orm (3) ; 

A Apply (T ,S) ; 

END; 

20 

SelectCode returns the next source code object to 
be obfuscated. SelectTransf orm returns the 
transformation which shoixld be used to obfuscate 
the particular source code object. Apply applies 

25 the transformation to the source code object and 
updates the application accordingly. Done 
determines when the required level of obfuscation 
has been attained. The conplexity of these 
functions will depend on the sophistication of the 

30 obfuscation tool. At the simplistic end of the 
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scale, SelectCode and SelectTransf orm could simply 
return random source code ohject/transf ormations , 
and Done could terminate the loop when the size of 
the applica- 
tion exceeds a certain limit. Normally, such 
behavior is insufficient. 

Algorithm 1 gives a description of a code 
obfuscation tool with a m-uch mere sophisticated 
selection and terininaticn behavxor. In one 
embodiment; the algorithm ir.akes use of several 
data structures, which are constructed by 
Algorithm's 5 , 6 , and 7 : 

Pg For each source cede object ?s{S) 
is the set or langucige constructs the 
programmer used m ^-^s(£) is used to 
find appropriate obfuscating 
transformations for S. 

A For each source code object S, A(S) = 
(Ti --> V, ; . . . ; Vj is a 

mapping from transf oiriviations Ti to values 
V^, describing hov; appropriate 
it would be to apply to S. The idea is 
that certain transformations may be 
inappropriate for a particular source 
code object S, because they introduce 
new code which is " "unnatural ' ' to S. 
The new code would look out of place in 
S and hence would be easy to spot for a 
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reverse engineer. The higher the 
appropriateness value the better the 
code introduced by transformation will 
fit in. 

5 

I For each source code object S, I(S) is 
the obfuscation priority of S. I{S) 
describes how iraporrcait it is to 
obfuscate the contents of S. If S 
10 contains an imporr.ant trade secret then 

I (S) will be high, if it contains mainly 
"bread-and-butter'' code I{S) will be 
low. 

15 R For each routine M .. R{M) is the 

execution tirae rank of M . R(M) = 1 if 
more time is spent executing M than any 
other routine. 

20 The primary input to Algorithm 1 is an 

application A and a set of obfuscating 
transformations ( ; T2 ; - . . } . The algorithm 
also requires information regarding each 
transformation, particularj,y three quality 

25 functions T^eg(S), Tpot(S), and T^o3t(S) (similar to 
their namesakes in Section 5, but returning 
numerical values) and a function Ft : 

Tres(S) retur*ns a measure of the resilience of 
30 transformation T when applied to source code 
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object S (i.e., how well T v;ill withstand an 
attack from an automatic di2obf uscator) . 

Tpot(S) returns a measure of the potency of 
transformation T v/hen applied to source code 
object S (i.e., how m.uch more difficult S 
will be for a human 

to understand af'cer having been obfuscH^ced by 
T) . 

Tcost'S) returns a measure of the execution 
time and space penal\:y r.dded by T to S . 



Pt maps each transformation T to the set of 
15 language constructs tliat T v.dll add zo 

application. 



10 



Points 1 to 3 of Algcrichtr. 1 load the application 
to be obfuscated, and. bv:.ilds ajr;:ropriate internal 

20 data structures. Point 4 builds ^^{S) , A(S) , I(S), 
and R(M ) . Point 5 applies obfuscating 
transformations uncil the required obfuscacion 
level has been attained or until the maximum 
execution time penalty is exceeded. Point 6, 

25 finally, rewrites the new apxDiication A' . 

Algorithm 1 (Code Obfuscation) 

input: a) An application made up of source 
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code or object code files CI ; C2 ; . . 

b) The standard libraries LI ; L2 ; . . 
. de- 

5 fined by the language. 

c) A set of obfuscating transformations 
{Tl ; T2 ; . . . } . 

d) A mapping Pt v;hich, for each 
transformation T gives the set of 

10 language constructs tihat T v/ill add to 

the application. 

e) Three functior-:; 'C^^^iS) , Tpcc(S). 
Tcost'S) expressing the quality of a 
transformation T with respect to a 

15 source code object 3. 

f) A set of input data I = {ll; 12; . . 
. } to A. 

g) Tv/o numeric valines AcceptCost>0 and 
ReqObf>0. Accept ::os^^ is a measure of the 

20 maximum extra execution time/space 

penalty the user \viil accept. ReqObf is 
a measure of the amount of obfuscation 
required by the user . 

25 output: An obfuscated application A' raade up 
of source code c:: object code files. 

1. Load the application C-l ; ; . . . to be 
obfuscated. The obfusca'cor could either 
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(a) load source code files, in which case the 
obfuscator would have to contain a complete 
compiler front-end performing lexical, 

5 syntactic, and semantic an^ilysis, {a less 

powerful obfuscator that restricts itself to 
purely syntactic transformation could manage 
without semantic analysis) or 

10 (b) load object cede files. If the object 

code retains most or all of the information 
in the source cede {as is -he case with Java 
class files) , this method is preferable. 

15 2. Load library code files LI; L2 ; , . . 
referenced directly or indirectly by the 
application. 

3. Build an internal representation of the 
2 0 application. The choice of internal representation 
depends on the structure of the source language 
and the complexity of the transformations the 
obfuscator implements. A t^^'pical set of data 
structures might include : 
25 (a) A control -flow graph for each routine in 

A. 

(b) A call-graph for the routines in A. 

(c) An inheritance graph for 'che classes in 
A. 
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4. Construct mappings R(M) and P^iS) (using 
Algorithm 5); I (S) (using Algorithm 6), and A(S) 
(using Algorithm 7) . 

5 

5. Apply the obfuscating transformations to the 
application. At each step we select a source code 
object S to be obfuscated and a suitable 
transformation T to apply to 3. The 

10 process terminates when the required obfuscation 
level has been x'eached or the accept- 
able execution time cost has been exceeded. 

REPEAT 

15 S := SelectCode (I) ; 

T := SelectTransf orra (S, A) ; 
Apply T to S and update relevant data 
structures from point 3; 
UNTIL Done(ReqObf, AcceptCost, S, T , I) . 

20 

6. Reconstitute the obfuscated source code objects 
into a new obfuscaced application, A' . 

Algorithm 2 (SelectCode) 

25 

input: The obfuscation priority mapping I as 
computed by Algorithm 6 . 

output: A source code objecu S. 
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I maps each source cods object S to I (S) , 
which is a measure of hov; important it is to 
obfuscate S. To select the next source code object 
5 to obfuscate, we can treac I as a priority queue. 
In other words, we select S so that I (S) is 
maximized. 

Algorithm 3 {Select Transform) 

10 

input: a) A source code object: S. 

b) The appropri-ateness mapping A ais 
computed by Algorithm 7 . 

15 output: A transformation T , 

Any number of heuristics can be used to 
select the most suitable transformation to apply 
to a particular source code object S. However, 

20 there are tv/o importanc issues to consider. 

Firstly, the chosen transformation must blend in 
naturally with the rest of the code in S. This can 
be handled by favoring transformations with a high 
appropriateness value in A(S) . Secondly, we want 

25 to favor transformations vvhicli yield a high 
' bang- for- the "buck ■ (i.e. high levels of 
obfuscation with low execution time penalty) . This 
is accomplished by selecting transformations that 
maximize potency and x-esilience, and minimize 

30 cost. These heuristics are captured by the 
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following code, where wl, w2, v/3 are 
implementation-defined constants : 

Return a transform T , such that 
5 T --> V is within A(S), and 

{wl*Tpot{S) + w2*T^es(S)+w3*V) / T,^st(S) 
is maximized. 



Algorithm 4 (Done) 

10 

input : 

a) ReqObf, the remaining level of 
obfuscation. 

b) AcceptCost, the remaining acceptable 
15 execution time penalty. 

c) A source code object 3. 

d) A transformation T . 

e) The obfuscation priority aiapping I . 



2 0 output : 

a) An updated ReqObf. 

b) An updated AcceptCost. 

c) An updated obfuscation priority mapping I. 

d) A Boolean return value which is TRUE if 
25 the termination condition has been 

reached. 



The Done function serves two purposes. It 
updates the priority queue I to reflect the fact 
30 that the source code object S has been obfuscated, 
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and should receive a reduced priority value . The 
reduction is based on a combination of the 
resilience and potency of the transformation. Done 
also updates ReqObf and AcceptCost, and determines 
5 whether the termination condition has been 

reached, w^, W2, W3 , v/4 ai:e iTP.plementation-def ined 
constants : 

I{S) I(S) - (W2Tp^3(S) -f V72Tres(S)); 

10 ReqObf := ReqObf - {w/Ip^glS) + WsT^es^S)); 

AcceptCost := AcceptCost -Tcost(s); 
RETURN AcceptCost <- 0 OR ReqObf 0. 

Algorithm 5 (Pragmatic Information) 

15 

input : 

a) An application A. 

b) A set of input: data I = { II ? 12 ; . . - } 
to A. 

20 

output : 

a) A mapping R(M) which, for every routine M 
in A, gives che execution time rank of M . 

b) A mapping P s (S) , which, for every 
25 source code object S in A, gives zhe set 

of language constructs used in S. 

Compute pragmatic information. This 
information will be used to choose the right type 
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of transformation for each particular source code 
object . 

1. Compute dynamic pragmatic information 

5 (i.e., run the application under a profiler 

on the input data set I provided by the user. 
Compute R(M) (the execution time rank of M) 
fcr each routine/baaic block, indicating 
where the application spends most of its 
10 time. 

2. Compute static pragmatic information Ps(S). 
Pg(S) provides statistics on the kinds of 
language constructs the programmer used in S. 

15 

FOR S := each source code objec:: in A DO 

O The set of operators that S uses; 

C The set of high-level language 

constructs (WHILE stateu:ents , 
20 except ions / threads, etc.) that S uses; 

L The set of library classes/rou-cines 

that S references; 

PS(S) := O U C U h; 
END FOR. 

25 

Algorithm 6 (Obfuscation Frioriuy) 

input : 

a) An application A. 
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b) R{M ), the rank of M. 

output: A mapping I(S) which, for each source 
code object S in A, gives the obfuscation 
5 priority of B . 

I (S) can be provided explicitly by the user, 
or it can be computed using a heuristic based on 
the statistical data gathered in Algorithm 5. 
10 Possible heuristics might be r 



1. For any routine M in A, let I (M) be 
inversely proportional to the rank of M , R(M 
). I.e. the idea is that ""if much time is 

15 spent executing a roucine M , then M is 

probably an important procedure that should 
be heavily obfuscated. - * 

2. Let I (S) be the complexity of S, as 

20 defined by one of the software complexity 

metrics in Table 1. Again, the (possibly 
flawed) intuition is thae complex cede is 
more likely to contain important trade 
secrets than simple code. 

25 

Algorithm 7 (Obfuscation Appropriateness) 

input : 

a) An applicai:ion A. 
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b) A mapping P t which, for each 
transformation T , gives the set of language 
constructs T will add to the application. 

c) A mapping P s (S) which, for each source 

5 code object S in A, gives the set of language 

constructs used in S. 



output: A mapping A{S) which, for each source code 
object S in A and each transformation T , 
10 gives the appropriateness of T with respect 

to S. 

Compute the appropriateness set A(S) for each 
source code object S. The mapping is based 
15 primarily on the static pragmatic information 
computed in Algorithm 5. 



FOR S := each source code object in A DO 
FOR T := each transformation DO 
20 V := degree of sirailarity between 

Pt (T) and Ps (S) ; 

A(S) := A(S) U {T > V} ; 

END FOR 
END FOR 

25 

11 Summary and Discussion 

We have observed that it may under many 
circumstances be acceptable for an obfuscated 
program to behave differently than the original 
30 one. In particular, most of our obfuscating 
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transformations make the target program slower or 
larger than the original. In special cases we even 
allow the target program to have different 
side-effects than the original, or not to 
5 terminate when the original program terminates 
with an error condition. Our only requirement is 
that the observable behavior (the behavior as 
experienced by a user) of the two progxams should 
be identical. 

10 Allowing such weak equivalence between 

original and obfuscated prograin is a novel and 
very exciting idea.. Although various 
transformations are provided and described above, 
many other transf ormationr:; will be apparent to one 

15 of ordinary skill in the ar'c aiid can be used to 

provide obfuscation for enhanced software security 
in accordance with -che present invention. 

There is also great poj-ei^tiai for much future 
research to identify trantf cruu;T:ions not yet 

20 known. In particular; we would like to see the 
following areas investigated: 

1. New obfuscating transformations should be 
identified. 

25 

2. The interaction aitd oxdering betv;een 
different txansformacicns should be studied. 
This is similar to work in code optimization, 
where the ordering of a sequence of 
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optimizing transformations has always been a 
difficult problem. 

3 . The relationship between potency and cost 
5 should be studied. For a particular kind of 

code we would like to Icnow which 
transformations would give the best 
"bang-for-the-buck" ( i.e., the highest 
potency at the lowest execution overhead) , 

10 

For an overview of all the transformations 
that have been discussed abo'/e^ see FIG. 31. For 
an overview of the opaque constructs that have 
been discussed above, see FIG. 32. However, the 
15 present invention should not be limited to the 
exemplary trans format ions and opaque constructs 
discussed above. 

11.1 The Power of Obf uscat;.r;n 

20 Encryption and program cbfuscation bear a 

striking resemblance to each other. Not only do 
both try to hide information from prying eyes, 
they also purport to do so for a limited cime 
only. An encrypted document has a limited 

25 shelf -life: it is safe only for as long as the 

encryp^cion algorithm itsel:': withstands attack, and 
for as long as advances in hardware speed do not 
allow messages for the chosen key- length to be 
routinely decrypted. Tne same is true fox* an 

30 obfuscated application; it remains secret only for 
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as. long as sufficiently powerful deobf uscators 
have yet to be built. 

For evolving applications this will not be a 
problem, as long as the time between releases is 
5 shorter than the cime it tiikes for the 

deobf uscator to catch up with the obfuscator. If 
this is the case, then by the time an application 
can be automatically deobfMScated it is already 
outdated and of no interest to a competitor. 

10 However, if an application contains trade 

secrets that can be assumed to survive several 
releases, then these shoul'I h^:. protected by means 
other than obf usccLtion . Pc.:rtial server-side 
execution (Figure 2(b)) see-.'us the obvious choice, 

15 but has the drav;back czhat ■:-he application will 

execute slowly or (when the netv;ork connection is 
down) not at all. 

11.2 Other Uses of Obf asca.-x:.-ii 

20 It is interesting to MOte that there may be 

potential applications of: ob'fuscation other than 
as discussed above. On=; porisibility is to use 
obfuscation in order to crace software pirates. 
For example, a vendor creates a new obfuscated 

25 version of his application for every new customer 
(We can generate difieranc obfuscated versions of 
the same application by inl:roducing an element of 
randomness into the Select Ir^nsform algorithm 
(Algorithm 3) . Different s^:ed3 to the randoai 

30 number generator will produce different versions.) 
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and keeps a record of to v/hora each version v/as 
sold. This is probably only reasonable if the 
application is being sold and distributed over the 
net.' If the vendor finds out that his application 
5 is being pirated, all he nsads to do is to get a 
copy of the pirated version, compare it against 
the data base, and see who bought the original 
application. It is, in fa.cz ^ i-ot necessary to 
store a copy of every obfuscated version sold. It 
10 suffices tc keep the randon^. nuiT.ber seed that was 
sold. 

Software pirates could uheaiiSelves make 
(illicit) use of cbfuscaticn. Because the Java 
obfuscator we outlined abcv3 v;c:.:k3 at the bytecode 

15 level, there is nothing sLo}.:ping a pirate from 

obfuscating a legally bought Java application. The 
obfuscated version could then be resold. When 
faced with litigation the pirate could argue that 
he is, in fact, not reselling one application that 

20 he originally bought (after al^, the code is 
completely differentl), but racher a legally 
reengineered version. 



25 Conclusion 

In conclusion, the present invention provides a 
computer implemented method and apparatus fox" 
preventing, or at least hampering, reverse engineering 
of software. While this may be effected at the expense 

3 0 of execution time or program size with the resulting 
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transformed program behaving differently at a detailed 
level, it is believed thar, the present technique 
provides significant utility in appropriate 
circumstances. In one embodiment, the transformed 
5 program has the same observable behavior as the 
untransformed program. Accordingly, the present 
invention allows for such weak equivalence between the 
original and obfuscated program. 

While the present discussion has been primarily in 

10 the context of hampering reverse engineering of 

software, other applications ^re contemplated such as 
watermarking sofcv;are objz':zs (including applications). 
This exploits the po\ientially distinctive nature of any 
single obfuscation procedure. A vendor would create a 

15 different obfusca-ied version of an application for 

every customer sold. If piraLze copies are found, the 
vendor need only compai'e it: against the original 
obfuscation information database to be able to trace 
the original application, 

20 The particular obfuscation transformations 

described herein are noc e^xhaustive. Further 
obfuscation regimes may b'i identified and used in the 
present novel obfuscation tool architecture. 

Where in the foregoing description reference has 

25 been made to elements or mcegers having known 

equivalents, then such equivalents are included as if 
they were individually sec forth. 

Although the presenu inv^ention has been described 
by way of example and with reference to particular 

30 embodiments. XL is to oe understood that modifications 
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and improvements can be made vrithout departing from the 
scope of the present invention. 
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CLAIMS 

What is claimed is: 

5 1. A computer implemented method for obfuscating 

code , comprising : 

selecting a subset of the code to obfuscate; 
selecting an obfuscating transform to apply; 

and 

10 applying the transformation, wherein the 

transformed code provides weak equivalence to the 
untransf ormed code , 

2. The comp\iter implemented method of Claim 1, 
15 further comprising: 

identifying one or more source code input 
files corresponding to source code for the code of 
an application to be piocessed; 

selecting a required level of obfuscation 
20 (the potency) ; 

selecting a maximum execution time or space 
penalty (the cost) ; 

reading and parsing the input files; 
providing information identifying data types, 
25 data structures, and control structures used by 

the application to be prccessed; 

selecting and applying obfuscating 
transformations to source code objects until the 
required potency has been achieved or the maximum 
30 cost has been exceeded; and 
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outputting the transformed code of the 
application. 

3. The method of Claiim 1, wherein the 

5 transformation compritses an opaque construct, the 

opaque construct being constructed using aliasing and 
concurrency techniques. 

4. The method of Claim 1, further comprising: 
10 outputting information about obfuscating 

transformations applied to the obfuscated code and 
information relating obfuscaced code of a 
transformed applicacirjn to source code of the 
application . 

15 

5. The method of Claini 1, wherein the 
transformation is seleccsd t*.o preserve the observable 
behavior of the cods of an application. 

20 6. The method of 21c-im 1, further comprising: 

deobf uscating the code, the decbf uscating the 
code comprising removing any obfuscations from the 
obfuscated code of an application by use of 
slicing, partial evaluation, dataflow analysis, or 

25 'statistical analysis. 

7. A computer prograra embodied on a computer- 
readable medium for obfuscating code, comprising: 

logic that selects a subsec of the code to 
30 obfuscate; 
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logic that selects an obfuscating transform 
to apply; and 

logic that applies the transformation, 
wherein the transformed code provides weak 
5 equivalence to the untransf ormed code. 

8. The computer program of Claim 1, furtiier 
comprising: 

logic that identifies one or more source code 
10 input files corresponding to source code for the 

code of an application to be processed; 

logic that selects a required level of 
obfuscation (the potency) ; 

logic that seleccs a inaximum execution time 
15 or space penalty (the cost} ; 

logic that reads and parses the input files; 
logic that provides information identifying 
data types, data structures, and control 
structures used by the application to be 
20 processed; 

logic that selec^:.s and applies obfuscating 
transformations to source code objects until the 
required potency has been achieved or the maximum 
cost has been exceeded; and 
25 logic that outputs the transformed code of 

the application. 

9. The computer program of Claim 7, wherein the 
transformation comprises an. opaque construct, the 
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opaque construct being cons t:irac ted using aliasing and 
concurrency techniques . 

10. The computer program of Claim 1 , further 
5 comprising: 

logic that outputs information about 
obfuscating transformations applied to the 
obfuscated code and inf oriiiation relating 
obfuscated code of a tx'ans formed application to 
10 source code of the application. 

11. The computer progx-n'i of Claim 7, wherein the 
transformation is selected to preserve the observable 
behavior of the code of a:i application. 

15 

12. The compu'Ler prograra of Claim 7, further 
comprising: 

logic that deobfuscatas the code, the 
deobfuscating the code comprising removing any 
20 obfuscations from the obfuscated code of an 

application by use of slicing, partial evaluation, 
dataflow analysis, or statistical analysis. 

13. An apparatus fo;r obfuscating code, 
25 comprising: 

means for seleccirig a subset of the code to 
obfuscate; 

means for select iixg an obfuscating transform 
to apply; and 
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means for applying the transformation, 
wherein the transformed code provides weak 
equivalence to the uncransf ormed code. 

5 14. The apparatus of Claln 13, further 

comprising: 

means for identifying one or more source code 
input files corresponding to source code for the 
code of an applicacicn to be processed; 
10 means for selecting required level of 

obfuscation (the potency) ; 

means for selecting a maximum execution time 
or space penalty (the cost) ; 

means for reading and parsing the input 
15 files; 

means for providing information identifying 
data types, data 3truc:.v..ro3, and control 
structures used by the application to be 
processed; 

20 means for selecting and applying obfuscating 

trans foiTTiat ions to sourc'i- code objecr.a until the 
required potency has been achieved or the maximum 
cost has been eixceeded; a;id 

means for outpu::civ;g the transformed code of 

25 the application. 

15. The apparatus of ClaivTi 13, wherein the 
transf oxtnation comprises an opaque construct, che 
opaque construct being conscructed using aliasing and 
3 0 concurrency techniques. 
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16. The apparatus of Claiim 13, further 
comprising : 

means for outputcing information about 
5 obfuscating transformations applied to the 

obfuscated cede and information relating 
obfuscated code of a trcuxsformed application to 
source code of the application. 

10 17. The apparatus of Claim 13, wherein the 

transformation is selected to preserve the observable 
behavior of uhe code of an application. 

18. The apparatus of Claim 13, furcher 
15 comprising: 

means for deobf uscacing the code, the 
deobfuscating the code ccmprising removing any 
obfuscations from the obfuscated code of an 
application by use of slicing, partial evaluation, 
20 dataflow analysis, or statistical analysis. 

19. The apparatus of Claim 13, wherein the code 
comprises Java'*^ bytecode. 

25 20. The apparatus of Claim 13, wherein the 

transformation provides a data obf uscation, a control 
obfuscation, or a preventive obfuscation. 
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METRIC METRIC NAME CITATION 



Program Length Halstead 



E(F) increases with the number of operators and operands in P. 



)Li2 Cyclomatic Complexity _ McCab e 



E{F) increases with the number of predicates in F. 



Nesting Complexity Harrigon 



E(F) increases with the nesting level of conditionals in F. 



^4 Data Flow Complexity Oviedo 



E{F) incraases with the number of int er-basic bioai; v ariable references in F. 

^ig Fan-in/out C o mplexity 



E(F) increases with the number of formal parameters to F, and with the number 
of global data struc tures read or updated b y F. _ 

M-e Data S tructure Com plexity Munson 

E{F) increases with the complGxry of the static data structures declared in P. 
The complexity of a scalar variable is constant. Tiie complexity of an array 
increases wit!^ tne numbef of in-siisions and wivh -he complexity of the 
element type. Tfie complexity of a record increases with the number and 
complexity of its fields. 



II7 00 Metric fil^l^r'ifl.^il 



E{C) increases with {[l^) the nun-'ber of methods in C 1 (fi the depth 
(distance f re m the root) of 0 in thp irshsritance tree, {\i ^) the number of direct 
3ubcias.ses of C ^ (jx^) the nurnbe'- of other classes to which C is coupled*, 

) the number cf methods lliai ::sn be executGd in response to a message 
sei^.t to an ot)ject of C ^ (^i. i) tne degree to which C s methods do not 
reference th^-! B^erm set of (n^:tr.nr;(:» variable's. Motu: 11 1 measures cohesion; 
i.e., hcv/ strongly related ihs {^ieirents of a modiJle ars. ' 



*Two classes are coupied if one uses the methods or instEnce variables of the other. 



SUBST ITUTE SKlsEr(,^ULE 26) 



wo 99/01815 



PCT/US98/12017 



6/27 



inter- strong full 

trivial weak strong full one-way . procedural 



Proaramrner 
effort 
4- 



inter- 
process 



low 

resilience 



I Global 

high I 
resilience j Local 



full full 



Vt^eak strong 
trivial weak 



f>oiy Exp 
time time 



Deobfus- 

cater 

effort 



Bb 





I 
I 
I 




ill. 



int V, a=5; b--6; 
v=''^=a + b; 



if (b > 6) 



if (random (1,5) <0) F'„ 



int V, a=5; b=6; 

if(...) ... 

. (b is unchanged) 

if (b < 7) ' a++1 

v=="'^=: (a > 5) 7v=b=b; v=b 
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class C { 
method Ml (T1 a) { 
3MI. s^^- 



1 



k ' 



) 

method M2 (T1 b; T2 c) { 

Sk1;...S k2 ; 
1 m 



class C { 
method M(Ti a;T2c; intV){ 
lf{V==:p){Si ;...Sk ;} 
else {Sf;...sl?;} 



} 



} 



} 



{ Cx=newC; 
x.M1(a);x.M2(b,c);} 



{ C'x^newC; 
x.y(a, G, V""^ 1; 



class C { 
method m (int x) 
{Si...Sk} 

} 

{ Cx = newC; 
x.m(8); ... x.m(7); 

} 



class 01 { 
method m (int x) 
{Si 



•S^ } 



method m1 (int x) 

,c 
'1 



} 

class 02 inherits C1 { 
melliOd M (in! x) 
{3? 



} 



{ Clx; 

if (P7 ) x=new C1 else x=new C2; 
x.m(5), ...;x.m1(7); 

} 



FIG. 19 
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FIG. 20a 



FIG. 20b 



FIG. 20c 



for (i=1 ,i<=n,i++) 
for(j=1,j<=n,j++) 
a[TJ]=bD,i] 



for(i=2,i<(n-1),i++) 
a[i] +=a[1-i]==[i+1] 



for(i=:1,i<n,i+-f) { 
a[i]+=c; 

x[i+"ij=cl+x[i+1]-a[IJ 



for(I=1.I<=n, I+=64) 
fcr(J=l.J<=n,J+=64) 
for (i~f,i<=min(I+63,n),i++) 
for (j_J,j<=min(J+65,n),j++) 

a(i,jH - 



} 



for (i=2,i<(n-2),i+=2) { 
a[i]+= n[i-1]=a[i+1]; 
afi+l)+= a[i]=a[i+2]; 

if((i'n-2)%2) = 1) 
xtr|••1]^•= a[R-2]=a[n] 



for (i=1 ,!<n,i++) 

a[i] +^ c; 
for(i=1, kn, i++) 

.{[i+i] <cJ+xli+1]=a[i] 



FIG. 21a 



.2 



FIG. 21c FIG. 21 d 
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(1) boolA,B,C; 

(2) A = True; 

(3) B = False; 

(4) C = False; 

(5) C = A&B; 

(6) C = A& B; 

(7) C = AIB; 

(8) if(A)...; 

(9) if (B) .; 

(10) if(C)...; 



{V) short a1,a2,b1,b2,clc2; 

(2') al=0;a2:=1; 

(3') bi=0:b2=0; 

(4=) Cl=1;c2=1; 

(5') x=AND[2*a-:+a2,2^b1+b2]; c1=x/2; c2=x%2; 

(6') 01^31^^32) &(b1^b2);c2=0 

(7') x=OR(2*a1+82,2*L)1+b2]; c1=x/2; c2=x%2; 

^3'} x=2*a1-'-a2; if .'fet) II (x==2) ...; 

(9') If(b1'^b2)...: 

1^ ifA/AL[c1,c2"i: ...: 



SUBSTITUTE SHF:;";T(?i»JI.S; 26) 



wo 99/01815 



PCT/US98/12017 



16/27 

String G (int n) { 
int i=0,k; 
String B; 
while (i) { 

L1: if (nr:=1) {S{i++]="A";k=0;goto L6}; 
L2: if (n==2) {SO i t j='B";k= -2 ;goto L6}; 
L3: if (n==3) {Sn4+h=T;";goto L8}; 
L4: if (n==4) {Sii++]="K";goto L9}; 
L5: if (n=5) {S[i+-ri=^''C";goto L11}; 

if (n>12)goto L1; 
L6: if (k++<=2) (3[i+-hl="A";goto L6} elae goto L8; 
L8: returns; 
L9: S[i++]="C";gotoL10; 
L10: SP++KB'; go-io L8; 
U1: S{i++]="C";(,otoL12; 
L12: goto LI 0; 



Z(X + r,Y) = 2'"-Y + (r + X) ==Z(X,Y) + r 

Z(X,Y + r) := l^"^- (Y + r) i X Z{X,Y) + r • 2 

Z{X . r, Y) = 2 22 . Y ^. - . . ::/x,Y) + (r • 1 } X ^ 

Z(X,Y-r) =2^2.Y.r + X =:Z(X,Y) + (r-1)-22^ ■ Y 



FIG. 23a 



FIG. 23b 



(2) X+=5; 

(3) Y+=11; 

(4) X* = Ci 

(5) Y* = d; 



(1) intX=45,Y=95; 



(1 ') long Z--.- 1 677690661 1 9551 045; 

(?.')Z+=r5; 



^ f3') Z+= 47244640256; 

^ (4) z -v.: (C'1)*(Z& 4294967295); 

(5 i Z v:^ (ci-1) . {Z & 18446744069414584320); 
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(1) intA[9]; 

(2) A[i] = ...; 



(3) intB[8],C[19]; 

(4) B[i] = ...; 

(5) C[i] = ...; 

(6) intD[9] 

(7) for(i=o;i<=B;i++) 

D[i]=2*D[i+1]; 



(8) intE[2,2]; 

(9) for{i=Q;i<=2;i ++) 

for(j=0;i<=2;i++) 
swap(E(i,j], ED.i]); 



(1') intA1[4A2[4]; 
(2') if((i%2)=0)A1[i/2]=... 
eiseA2[i/2]=...; 

(3') intBC[20]; 
(4') BC[3*i] = ...; 
(5') BC[i/2.3+1+i%2] = ...; 
(6') intD1[1,4j; 
(7') for(j=0;j<=1;j++) 
for/k-0;k<-4;k++) 
if 

D10,k]=2*D1D+1.O]; 
else 

D10.k]=:2*D10.k+1]; 

(8') intElfS] 

19') for(i.-/j;1<:.8;i++) 

swfip(E[i], Ef3=(i%3)+i/3]); 



FIG, 24a 
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Node g, h; 

method P(...,Node f) { 
/ * 1 */ g = g.Move(); 

h = h.MoveO; 
/ * 2 */ h = h.lnsert(new Node); 

/*3 */ x.R(...,f.Move()); 

/*4 »/ !f(f==g)?.. 

/ » 6 J hTDkon=False; 

g. Tok8r:=True; 

/*7 */ if , 'f Token)?... 

/*?. J fTo\ien~True; 

h. 'roken=False; 

/ ♦ 8 ./ it 0 -Token)"'" ... 

} 



program M; 
'"if(F- ... 
end M. 

"IP 

Output 



input 





program M'; 
if (True) ... 
end W. 



<Jdentical?' 



4^ 

Output' 



SUBSTITUTE SHf:ET (RULE 28) 



FSG. 30 



wo 99/01815 



PCT/US98/12017 



22J27 



thread S { 

intR; 
• while (1) { 

R = random(1,C); 

X = R*R; 

sleep(3); 

} 

} 



thread T { 
int R; 
while (i) { 

R = random(1,C); 

X = 7*R*R; 

sleep(2): 

X*=X; 

sleep(5) 

} 



intX, Y; 

const C =; sqrt(maxint)/10; 
main () { 
S.runQ; T.run(); 

if ((y"i)==X) 



{ 



if {p7) 



else 

(a) 

=> if (Q''') 



else 
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if (p^) 



else 



if (True) 



else 



if (True) 
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if (RF) 


if (False) 


if (False) 
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